Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
SamoXXX 's Collections
LLMs Train Datasets
Image and 3D GenAI/Reconstruct
Vision LLM

Vision LLM

updated May 5, 2025

Collecting best Vision LLMs - to study and learn from them

Upvote
-

  • rhymes-ai/Aria

    Image-Text-to-Text • 25B • Updated Apr 23, 2025 • 116k • 638

  • microsoft/OmniParser

    Image-Text-to-Text • Updated Dec 2, 2024 • 268 • 1.71k

  • jadechoghari/Ferret-UI-Gemma2b

    Image-Text-to-Text • 3B • Updated Oct 18, 2024 • 84 • 52

  • jadechoghari/Ferret-UI-Llama8b

    Image-Text-to-Text • 8B • Updated Jan 8, 2025 • 57 • 68

  • gpt-omni/mini-omni2

    Any-to-Any • Updated Oct 24, 2024 • 76 • 283

  • mPLUG/DocOwl2

    Image-Text-to-Text • 9B • Updated Sep 27, 2024 • 283 • 115

  • google/siglip-so400m-patch16-256-i18n

    Zero-Shot Image Classification • 1B • Updated Nov 18, 2024 • 386 • 31

  • openvla/openvla-7b

    Robotics • 8B • Updated Feb 17 • 1.77M • 220

  • NexaAI/OmniVLM-968M

    0.5B • Updated Aug 20, 2025 • 717 • 532

  • Qwen/Qwen2.5-VL-7B-Instruct

    Image-Text-to-Text • 8B • Updated Apr 6, 2025 • 5.03M • • 1.55k

  • ByteDance-Seed/UI-TARS-7B-SFT

    Image-Text-to-Text • 8B • Updated Jan 25, 2025 • 4.33k • 180

  • moonshotai/Kimi-VL-A3B-Instruct

    Image-Text-to-Text • 16B • Updated Jan 30 • 295k • 259

  • reducto/RolmOCR

    Image-Text-to-Text • 8B • Updated Apr 2, 2025 • 256k • 586
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs