VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding Paper • 2606.05259 • Published 5 days ago • 34
FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search Paper • 2606.00660 • Published 9 days ago • 8
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges Paper • 2605.00063 • Published Apr 30
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 20 days ago • 81
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 20 days ago • 81
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains Paper • 2410.09207 • Published Oct 11, 2024
Analyzing Diffusion and Autoregressive Vision Language Models in Multimodal Embedding Space Paper • 2602.06056 • Published Jan 19
TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction Paper • 2604.22880 • Published Apr 24 • 10
A Survey of Multimodal Mathematical Reasoning: From Perception, Alignment to Reasoning Paper • 2603.08291 • Published Apr 14
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems Paper • 2605.04018 • Published May 5 • 40
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems Paper • 2605.04018 • Published May 5 • 40
Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies Paper • 2604.00830 • Published Apr 2 • 15
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation Paper • 2603.09723 • Published Mar 10 • 7
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Paper • 2507.13300 • Published Jul 17, 2025 • 20
PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles Paper • 2510.06475 • Published Oct 7, 2025 • 2