OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 9 days ago • 69
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published 22 days ago • 57
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 109
Optimizing Few-Step Generation with Adaptive Matching Distillation Paper • 2602.07345 • Published Feb 7 • 9
Optimizing Few-Step Generation with Adaptive Matching Distillation Paper • 2602.07345 • Published Feb 7 • 9
Optimizing Few-Step Generation with Adaptive Matching Distillation Paper • 2602.07345 • Published Feb 7 • 9
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better Paper • 2602.05393 • Published Feb 5 • 8
PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers Paper • 2602.01077 • Published Feb 1 • 4
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published Dec 18, 2025 • 89
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors Paper • 2512.16915 • Published Dec 18, 2025 • 38
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation Paper • 2511.19365 • Published Nov 24, 2025 • 66
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published Nov 22, 2025 • 38