Qwen2.5-Omni-7B Decoder-Only (GGUF)

Text decoder extracted from Qwen/Qwen2.5-Omni-7B — all vision, audio, talker, and token2wav components removed.

This is a pure text LLM (7.62B params) that runs standalone in llama.cpp without any multimodal dependencies.

Model Details

Parameter	Value
Architecture	Qwen2VL (text decoder only)
Parameters	7.62B
Hidden size	3584
Layers	28
Attention heads	28 (4 KV heads, GQA)
FFN intermediate	18944
Vocab size	152064
Max context	32768
RoPE base	1000000
Tokenizer	GPT2-style BPE

Files

File	Size	BPW	Description
`Qwen2.5-Omni-7B-decoder-Q8_0.gguf`	7.6 GB	8.50	Q8_0 quantized

How It Was Made

Extracted using llama.cpp's convert_hf_to_gguf.py which automatically:

Strips thinker. prefix from weight names
Drops all visual.*, audio.*, talker.*, token2wav.* tensors
Outputs a standard Qwen2.5 text decoder GGUF

# Step 1: Extract decoder to F16
python convert_hf_to_gguf.py Qwen/Qwen2.5-Omni-7B \
    --outfile Qwen2.5-Omni-7B-decoder-F16.gguf --outtype f16

# Step 2: Quantize to Q8_0
llama-quantize Qwen2.5-Omni-7B-decoder-F16.gguf \
    Qwen2.5-Omni-7B-decoder-Q8_0.gguf Q8_0

Usage with llama.cpp

# Benchmark
./llama-bench -m Qwen2.5-Omni-7B-decoder-Q8_0.gguf -t 6 -p 512 -n 128 -fa 1

# Text generation
./llama-cli -m Qwen2.5-Omni-7B-decoder-Q8_0.gguf -p "Hello" -n 200

# Further quantize locally
llama-quantize Qwen2.5-Omni-7B-decoder-Q8_0.gguf \
    Qwen2.5-Omni-7B-decoder-Q4_0.gguf Q4_0

Component Breakdown (Full Omni Model)

The full Qwen2.5-Omni-7B (10.73B params) consists of:

Component	Params	Description
Decoder (this repo)	7.62B	Text LLM
Vision Encoder	0.68B	ViT (32 layers)
Audio Encoder	0.64B	Whisper-style (32 layers)
Talker	1.35B	Speech decoder (24 layers)
Token2Wav	0.45B	DiT + BigVGAN vocoder

License

Apache 2.0 (same as base model)

Downloads last month: 41

GGUF

Model size

8B params

Architecture

qwen2vl

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yeonseok-zeticai/QWEN_2.5_omni_decoder

Base model

Qwen/Qwen2.5-Omni-7B

Quantized

(20)

this model