Qwen3-VL-Reranker-8B-MLX-4bit

This is the MLX 4-bit quantized version of Qwen/Qwen3-VL-Reranker-8B, optimized for Apple Silicon (Mac / iPad / iPhone) inference using the MLX framework.

Quantization Info

Config	Value
Bits	4
Group Size	64
Quantization Mode	Affine
Dtype	bfloat16

Model Overview

Model Type: MultiModal Reranker
Supported Modalities: Text, images, screenshots, videos, and arbitrary multimodal combinations
Parameters: 8B
Context Length: 32k
Languages: 30+

Requirements

pip install mlx-lm transformers

Usage with `mlx-lm`

from mlx_lm import load

model, tokenizer = load("Zeknes/Qwen3-VL-Reranker-8B-MLX-4bit")

For full usage examples (multimodal reranking, vLLM), please refer to the original model page: Qwen3-VL-Reranker-8B

Citation

@article{qwen3vlembedding,
  title={Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking},
  author={Li, Mingxin and Zhang, Yanzhao and Long, Dingkun and Chen Keqin and Song, Sibo and Bai, Shuai and Yang, Zhibo and Xie, Pengjun and Yang, An and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang},
  journal={arXiv preprint arXiv:2601.04720},
  year={2026}
}

Downloads last month: 41

Safetensors

Model size

2B params

Tensor type

F32

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

Text Ranking

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Zeknes/Qwen3-VL-Reranker-8B-MLX-4bit

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

Qwen/Qwen3-VL-Reranker-8B

Quantized

(5)

this model

Paper for Zeknes/Qwen3-VL-Reranker-8B-MLX-4bit

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Paper • 2601.04720 • Published Jan 8 • 58