Qwen3.6-35B-A3B - GGUF

This repository contains GGUF format model files for Qwen/Qwen3.6-35B-A3B.

Available Files

File Name	Format	Size	Description
`Qwen3.6-35B-A3B-Q4_K_M.gguf`	Q4_K_M	~20.5 GB	Recommended. 4-bit quantization using the K-quants method. Offers an excellent balance between inference speed, memory consumption, and model quality.
`Qwen3.6-35B-A3B-F16.gguf`	FP16	~70.0 GB	Unquantized 16-bit precision. Extremely large file size. Mainly provided for researchers or users who wish to apply their own custom quantization methods.

Note: Please check the exact file sizes in the "Files and versions" tab.

Hardware & Conversion Environment

Unlike most conversions performed on Nvidia hardware, this GGUF conversion was executed on an AMD APU platform leveraging a massive Unified Memory Architecture (UMA):

Processor: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
Memory Configuration: 128GB Unified Memory
Toolchain: llama.cpp compiled specifically for Windows ROCm (gfx1151 architecture).

The massive UMA pool allowed for the efficient handling and conversion of the 70GB+ unquantized FP16 tensors directly in system RAM before being quantized via the hardware's compute units.

How to Run with `llama.cpp`

You can run this model locally using the llama.cpp CLI. Download the Q4_K_M version and run the following command in your terminal:

bash

Basic chat/generation example

llama-cli.exe -m Qwen3.6-35B-A3B-Q4_K_M.gguf -p "You are a helpful assistant. Please explain the theory of relativity." -n 1024 -c 4096 -ngl 99

Parameters explanation:

-m: Path to the downloaded GGUF model.
-p: Your prompt.
-n: Maximum number of tokens to predict/generate.
-c: Context window size (adjust based on your available memory).
-ngl: Number of layers to offload to the GPU. For 35B models, set this to a high number (like 99) if you have 24GB+ of VRAM to fully offload the model for maximum speed.

Acknowledgements

All credit for the original model architecture and weights goes to the Qwen Team.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sharpcaterpillar/Qwen3.6-35B-A3B-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

(58)

this model