Why Fine-Tuning Encourages Hallucinations and How to Fix It Paper • 2604.15574 • Published Apr 16 • 24
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published Apr 27 • 71
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora Paper • 2604.24819 • Published Apr 27 • 89
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 29 days ago • 108
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding Paper • 2604.26779 • Published 29 days ago • 13
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Paper • 2605.22791 • Published 7 days ago • 30
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps Paper • 2605.16928 • Published 12 days ago • 93
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models Paper • 2605.20177 • Published 9 days ago • 9
Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR Paper • 2605.19282 • Published 9 days ago • 7
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Paper • 2605.26895 • Published 2 days ago • 14