Qwen3-Omni-30B-A3B-FP8

Block-wise FP8 quantization of Qwen/Qwen3-Omni-30B-A3B-Instruct.

Model Details

Property Value
Original Size 70.52 GB
Quantized Size 35 GB
Compression 1.89x
Quantization Block-wise FP8 E4M3 (128x128 blocks)
Format SafeTensors with weight_scale_inv scales

Components

Quantized to FP8:

  • Thinker (48 layers MoE) - main language model
  • Talker (20 layers MoE) - audio generation model

Kept in BF16:

  • Vision encoder (thinker.visual)
  • Audio tower (thinker.audio_tower)
  • Code2Wav decoder (code2wav)
  • Embedding layers
  • LayerNorm layers
  • MoE gate routing layers

Usage with vLLM

from vllm import LLM

llm = LLM(
    model="marksverdhei/Qwen3-Omni-30B-A3B-FP8",
    tensor_parallel_size=2,
    gpu_memory_utilization=0.85,
    max_model_len=4096,
    trust_remote_code=True,
)

Requirements

  • vLLM >= 0.13.0 with Qwen3-Omni support
  • 2x 24GB GPUs (e.g., RTX 3090) or equivalent
  • ~35 GB disk space

Quantization Details

Block-wise quantization with 128x128 blocks provides better precision than per-tensor quantization while maintaining good compression. Each block has its own scale factor stored as weight_scale_inv (inverse scale for efficient multiplication during inference).

Original Model

This is a quantized version of Qwen/Qwen3-Omni-30B-A3B-Instruct.

Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech.

Key Features:

  • State-of-the-art across modalities
  • Supports 119 text languages, 19 speech input languages, and 10 speech output languages
  • MoE-based Thinker-Talker architecture
  • Real-time audio/video interaction

For full details, see the original model card.

Downloads last month
10,932
Safetensors
Model size
35B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for marksverdhei/Qwen3-Omni-30B-A3B-FP8

Quantized
(15)
this model

Collection including marksverdhei/Qwen3-Omni-30B-A3B-FP8