Qwen2.5-Omni-7B Decoder-Only (GGUF)

Text decoder extracted from Qwen/Qwen2.5-Omni-7B โ€” all vision, audio, talker, and token2wav components removed.

This is a pure text LLM (7.62B params) that runs standalone in llama.cpp without any multimodal dependencies.

Model Details

Parameter Value
Architecture Qwen2VL (text decoder only)
Parameters 7.62B
Hidden size 3584
Layers 28
Attention heads 28 (4 KV heads, GQA)
FFN intermediate 18944
Vocab size 152064
Max context 32768
RoPE base 1000000
Tokenizer GPT2-style BPE

Files

File Size BPW Description
Qwen2.5-Omni-7B-decoder-Q8_0.gguf 7.6 GB 8.50 Q8_0 quantized

How It Was Made

Extracted using llama.cpp's convert_hf_to_gguf.py which automatically:

  1. Strips thinker. prefix from weight names
  2. Drops all visual.*, audio.*, talker.*, token2wav.* tensors
  3. Outputs a standard Qwen2.5 text decoder GGUF
# Step 1: Extract decoder to F16
python convert_hf_to_gguf.py Qwen/Qwen2.5-Omni-7B \
    --outfile Qwen2.5-Omni-7B-decoder-F16.gguf --outtype f16

# Step 2: Quantize to Q8_0
llama-quantize Qwen2.5-Omni-7B-decoder-F16.gguf \
    Qwen2.5-Omni-7B-decoder-Q8_0.gguf Q8_0

Usage with llama.cpp

# Benchmark
./llama-bench -m Qwen2.5-Omni-7B-decoder-Q8_0.gguf -t 6 -p 512 -n 128 -fa 1

# Text generation
./llama-cli -m Qwen2.5-Omni-7B-decoder-Q8_0.gguf -p "Hello" -n 200

# Further quantize locally
llama-quantize Qwen2.5-Omni-7B-decoder-Q8_0.gguf \
    Qwen2.5-Omni-7B-decoder-Q4_0.gguf Q4_0

Component Breakdown (Full Omni Model)

The full Qwen2.5-Omni-7B (10.73B params) consists of:

Component Params Description
Decoder (this repo) 7.62B Text LLM
Vision Encoder 0.68B ViT (32 layers)
Audio Encoder 0.64B Whisper-style (32 layers)
Talker 1.35B Speech decoder (24 layers)
Token2Wav 0.45B DiT + BigVGAN vocoder

License

Apache 2.0 (same as base model)

Downloads last month
41
GGUF
Model size
8B params
Architecture
qwen2vl
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yeonseok-zeticai/QWEN_2.5_omni_decoder

Quantized
(20)
this model