Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 7 days ago • 96
LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs Paper • 2605.17260 • Published 10 days ago • 24
KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration Paper • 2605.14278 • Published 13 days ago • 37
Lance: Unified Multimodal Modeling by Multi-Task Synergy Paper • 2605.18678 • Published 9 days ago • 74
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 9 days ago • 109
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models Paper • 2605.15055 • Published 13 days ago • 19
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation Paper • 2605.15141 • Published 13 days ago • 91
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published 14 days ago • 96
World Action Models: The Next Frontier in Embodied AI Paper • 2605.12090 • Published 15 days ago • 66
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 15 days ago • 186
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation Paper • 2605.06376 • Published 20 days ago • 26
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation Paper • 2605.03849 • Published 22 days ago • 124