Qwen3.6-35B-A3B - GGUF
This repository contains GGUF format model files for Qwen/Qwen3.6-35B-A3B.
Available Files
| File Name | Format | Size | Description |
|---|---|---|---|
Qwen3.6-35B-A3B-Q4_K_M.gguf |
Q4_K_M | ~20.5 GB | Recommended. 4-bit quantization using the K-quants method. Offers an excellent balance between inference speed, memory consumption, and model quality. |
Qwen3.6-35B-A3B-F16.gguf |
FP16 | ~70.0 GB | Unquantized 16-bit precision. Extremely large file size. Mainly provided for researchers or users who wish to apply their own custom quantization methods. |
Note: Please check the exact file sizes in the "Files and versions" tab.
Hardware & Conversion Environment
Unlike most conversions performed on Nvidia hardware, this GGUF conversion was executed on an AMD APU platform leveraging a massive Unified Memory Architecture (UMA):
- Processor: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
- Memory Configuration: 128GB Unified Memory
- Toolchain:
llama.cppcompiled specifically for Windows ROCm (gfx1151 architecture).
The massive UMA pool allowed for the efficient handling and conversion of the 70GB+ unquantized FP16 tensors directly in system RAM before being quantized via the hardware's compute units.
How to Run with llama.cpp
You can run this model locally using the llama.cpp CLI. Download the Q4_K_M version and run the following command in your terminal:
bash
Basic chat/generation example
llama-cli.exe -m Qwen3.6-35B-A3B-Q4_K_M.gguf -p "You are a helpful assistant. Please explain the theory of relativity." -n 1024 -c 4096 -ngl 99
Parameters explanation:
-m: Path to the downloaded GGUF model.-p: Your prompt.-n: Maximum number of tokens to predict/generate.-c: Context window size (adjust based on your available memory).-ngl: Number of layers to offload to the GPU. For 35B models, set this to a high number (like 99) if you have 24GB+ of VRAM to fully offload the model for maximum speed.
Acknowledgements
All credit for the original model architecture and weights goes to the Qwen Team.
Model tree for sharpcaterpillar/Qwen3.6-35B-A3B-GGUF
Base model
Qwen/Qwen3.6-35B-A3B