DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 6 days ago • 201
parkjo/Qwen2.5-Math-1.5B_grpo_ppl_adv_entropy_rollout_8_KL_0.001_ent_0.001_USE_KL__step580 2B • Updated 4 days ago • 32 • 1
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution Paper • 2605.15301 • Published 12 days ago • 22
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents Paper • 2605.10341 • Published 15 days ago • 34
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 19 days ago • 229
Perceptual Flow Network for Visually Grounded Reasoning Paper • 2605.02730 • Published 22 days ago • 7
The Continuity Layer: Why Intelligence Needs an Architecture for What It Carries Forward Paper • 2604.17273 • Published Apr 19 • 3
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 242
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published Apr 1 • 4
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 351
inference-optimization/gpt-oss-120b-from-qwen235b-then-self-ckpt4-speculator.eagle3 0.9B • Updated Apr 1 • 4 • 1
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published Mar 17 • 109