MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models Paper • 2506.04688 • Published Jun 5, 2025 • 3
Does Audio Matter for Modern Video-LLMs and Their Benchmarks? Paper • 2509.17901 • Published Sep 22, 2025
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models Paper • 2506.13564 • Published Jun 16, 2025
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 5 days ago • 53
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging Paper • 2606.01717 • Published 5 days ago • 20
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging Paper • 2606.01717 • Published 5 days ago • 20
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation Paper • 2605.30350 • Published 9 days ago • 12
Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion Paper • 2605.23346 • Published 15 days ago
optimize_anything: A Universal API for Optimizing any Text Parameter Paper • 2605.19633 • Published 18 days ago • 6
Sparse Mixture-of-Experts are Domain Generalizable Learners Paper • 2206.04046 • Published Jun 8, 2022 • 1
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29, 2024 • 16
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey Paper • 2407.21794 • Published Jul 31, 2024 • 6
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper • 2506.13654 • Published Jun 16, 2025 • 44
VideoLucy: Deep Memory Backtracking for Long Video Understanding Paper • 2510.12422 • Published Oct 14, 2025 • 1
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published Apr 1 • 30