LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 2 days ago • 90
TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization Paper • 2605.20150 • Published 9 days ago • 7
RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting Paper • 2605.18263 • Published 10 days ago • 9
Aurora: Unified Video Editing with a Tool-Using Agent Paper • 2605.18748 • Published 10 days ago • 29
UniT: Unified Geometry Learning with Group Autoregressive Transformer Paper • 2605.21131 • Published 8 days ago • 8
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments Paper • 2604.26067 • Published 30 days ago • 74
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons Paper • 2604.28130 • Published 28 days ago • 22
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published about 1 month ago • 118
Video Analysis and Generation via a Semantic Progress Function Paper • 2604.22554 • Published Apr 24 • 63
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing Paper • 2604.22586 • Published Apr 24 • 16
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 227
StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition Paper • 2604.21689 • Published Apr 23 • 25
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics Paper • 2604.17295 • Published Apr 19 • 85
WorldMark: A Unified Benchmark Suite for Interactive Video World Models Paper • 2604.21686 • Published Apr 23 • 36
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published Apr 15 • 163
Learning Long-term Motion Embeddings for Efficient Kinematics Generation Paper • 2604.11737 • Published Apr 13 • 6