Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 6 days ago • 147
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization Paper • 2604.09574 • Published Feb 24 • 30
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 7 days ago • 82
jasonrqh/Math-CoT-44k-Qwen3-32b-n32-16384-with-logprob-and-entropy Viewer • Updated 9 days ago • 44.4k • 2.09k • 1
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration? Paper • 2603.03202 • Published Mar 3 • 17
ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety Paper • 2604.02022 • Published 19 days ago • 15
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 13 days ago • 317
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 13 days ago • 317 • 7
Rethink_SFT_generalization Collection Repo for paper Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability. • 40 items • Updated 9 days ago • 16
view article Article Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model Jan 1 • 19
jasonrqh/Math-CoT-44k-Qwen3-32b-n32-16384-with-logprob-and-entropy Viewer • Updated 9 days ago • 44.4k • 2.09k • 1
jasonrqh/Math-CoT-44k-Qwen3-32b-n32-16384-with-logprob-and-entropy Viewer • Updated 9 days ago • 44.4k • 2.09k • 1