Qwen3.6-35B-A3B - GGUF

This repository contains GGUF format model files for Qwen/Qwen3.6-35B-A3B.

Available Files

File Name Format Size Description
Qwen3.6-35B-A3B-Q4_K_M.gguf Q4_K_M ~20.5 GB Recommended. 4-bit quantization using the K-quants method. Offers an excellent balance between inference speed, memory consumption, and model quality.
Qwen3.6-35B-A3B-F16.gguf FP16 ~70.0 GB Unquantized 16-bit precision. Extremely large file size. Mainly provided for researchers or users who wish to apply their own custom quantization methods.

Note: Please check the exact file sizes in the "Files and versions" tab.

Hardware & Conversion Environment

Unlike most conversions performed on Nvidia hardware, this GGUF conversion was executed on an AMD APU platform leveraging a massive Unified Memory Architecture (UMA):

  • Processor: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
  • Memory Configuration: 128GB Unified Memory
  • Toolchain: llama.cpp compiled specifically for Windows ROCm (gfx1151 architecture).

The massive UMA pool allowed for the efficient handling and conversion of the 70GB+ unquantized FP16 tensors directly in system RAM before being quantized via the hardware's compute units.

How to Run with llama.cpp

You can run this model locally using the llama.cpp CLI. Download the Q4_K_M version and run the following command in your terminal:

bash

Basic chat/generation example

llama-cli.exe -m Qwen3.6-35B-A3B-Q4_K_M.gguf -p "You are a helpful assistant. Please explain the theory of relativity." -n 1024 -c 4096 -ngl 99

Parameters explanation:

  • -m: Path to the downloaded GGUF model.
  • -p: Your prompt.
  • -n: Maximum number of tokens to predict/generate.
  • -c: Context window size (adjust based on your available memory).
  • -ngl: Number of layers to offload to the GPU. For 35B models, set this to a high number (like 99) if you have 24GB+ of VRAM to fully offload the model for maximum speed.

Acknowledgements

All credit for the original model architecture and weights goes to the Qwen Team.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sharpcaterpillar/Qwen3.6-35B-A3B-GGUF

Finetuned
(58)
this model