Anchored Policy Optimization: Mitigating Exploration Collapse Via Support-Constrained Rectification Paper • 2602.05717 • Published Feb 5 • 1
Running Agents 24 Croissant Checker - Dev 🔎 24 Validate Croissant dataset files for NeurIPS submissions
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published Apr 13 • 143
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework Paper • 2604.15308 • Published Apr 16 • 29
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published Apr 15 • 119
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space Paper • 2604.14142 • Published Apr 15 • 29
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published Apr 15 • 161
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance Paper • 2604.12627 • Published Apr 14 • 100
From Word to World: Can Large Language Models be Implicit Text-based World Models? Paper • 2512.18832 • Published Dec 21, 2025 • 15