video/image
updated
google/vit-base-patch16-224
Image Classification
• 86.6M • Updated • 4.85M
• • 965
OpenGVLab/internimage_g_jointto22k_384
Image Classification
• 3B • Updated • 49
• 1
chancharikm/qwen2.5-vl-72b-cam-motion
Video-Text-to-Text
• 73B • Updated • 14
• 1
Text Generation
• 2B • Updated • 2.56k
• 90
Updated • 74
• 1
Viewer
• Updated • 27.1k • 4.85k
• 3
Viewer
• Updated • 900 • 2.61k
• 14
moonshotai/Kimi-VL-A3B-Thinking-2506
Image-Text-to-Text
• 16B • Updated • 11.8k
• 360
lmms-lab/llava-critic-113k
Viewer
• Updated • 113k • 951
• 28
lmms-lab/M4-Instruct-Data
Updated • 1.39k
• 78
lmms-lab/llava-next-interleave-qwen-7b
Text Generation
• 8B • Updated • 138
• 27
lmms-lab/LLaVA-OneVision-Data
Viewer
• Updated • 3.94M • 14.8k
• 236
Viewer
• Updated • 19.2k • 57
Viewer
• Updated • 12.5k • 29
Multimodal Attention Merging for Improved Speech Recognition and Audio
Event Classification
Paper
• 2312.14378
• Published
avalab/cTBLS_knowledge_retriever
Updated
CraftJarvis/minecraft-vla-sft
Viewer
• Updated • 3.78M • 2.07k
• 10