Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments Paper • 2605.27209 • Published 4 days ago • 11
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions Paper • 2605.27141 • Published 4 days ago • 14
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection Paper • 2605.28534 • Published 3 days ago • 19
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning Paper • 2605.28424 • Published 3 days ago • 15
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published 5 days ago • 97
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment Paper • 2604.11689 • Published Apr 13 • 21
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published Apr 2 • 101
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published Feb 3 • 32
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published Jan 29 • 25
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents Paper • 2509.23040 • Published Sep 27, 2025 • 12
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published Jan 22 • 92
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published Sep 30, 2025 • 21
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2, 2025 • 128
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26, 2025 • 104
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11, 2025 • 53
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 49
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published Oct 24, 2024 • 32