PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion Paper • 2605.23902 • Published 4 days ago • 26
FramePrompt: In-context Controllable Animation with Zero Structural Changes Paper • 2506.17301 • Published Jun 17, 2025
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation Paper • 2605.13724 • Published 13 days ago • 96
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Paper • 2503.14492 • Published Mar 18, 2025 • 20
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Paper • 2506.09042 • Published Jun 10, 2025 • 5
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation Paper • 2510.04290 • Published Oct 5, 2025 • 21
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published May 29, 2025 • 38
Teaching LMMs for Image Quality Scoring and Interpreting Paper • 2503.09197 • Published Mar 12, 2025 • 1
Generative Frame Sampler for Long Video Understanding Paper • 2503.09146 • Published Mar 12, 2025 • 1
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published Jan 7, 2025 • 82
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper • 2503.01774 • Published Mar 3, 2025 • 44
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare Paper • 2405.19298 • Published May 29, 2024
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 21
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 111
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision Paper • 2309.14181 • Published Sep 25, 2023 • 2
Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach Paper • 2305.12726 • Published May 22, 2023