Qwen3.6-35B-A3B Rust Code Fine-Tune (MLX)

A fine-tuned version of Qwen3.6-35B-A3B optimized for Rust code generation and comprehension.

The base model is a Mixture-of-Experts architecture with 35B total parameters and 3B active per token (256 experts, 8 active). Quantized to 8-bit.

This model was tested with Swival on real-world Rust code.

Training

Fine-tuned on the jedisct1/rust dataset, which contains 356K commits from popular Rust repositories including diffs and commit messages.

Training was bidirectional — the model learned in two complementary directions:

Forward (instruction to code): given a description of a change, generate the corresponding unified diff
Reverse (code to instruction): given a diff, produce a concise description of what it does

This approach teaches both code generation and code understanding simultaneously. After training, the model produces outputs that resemble real commits from production Rust projects rather than textbook examples.

Configuration

Method: LoRA (rank 8, alpha 16) applied to all 40 layers
Target modules: q/k/v/o projections + gate/up/down MLP projections (including MoE expert layers)
Dataset: 634K training samples, 33K evaluation samples
Iterations: 1000 steps, batch size 1, gradient accumulation 4
Learning rate: 2e-5 with cosine schedule
Sequence length: 512 tokens
Hardware: Apple Silicon (M-series), ~50 GB unified memory

Results

Metric	Start	End
Training loss	1.92	0.69
Validation loss	1.66	0.74

Usage

Requires mlx-lm:

pip install mlx-lm

Generate a patch from a description

from mlx_lm import load, generate

model, tokenizer = load("jedisct1/Qwen3.6-35B-rust.mlx")

messages = [
    {"role": "system", "content": "You are an expert Rust developer."},
    {"role": "user", "content": "Generate a code patch for the following change:\n\nReplace all .unwrap() calls in parse_config() with proper ? operator error propagation"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(response)

Describe a code change

messages = [
    {"role": "system", "content": "You are an expert Rust developer."},
    {"role": "user", "content": "Describe what the following Rust code change does:\n\n```diff\n<your diff here>\n```"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
print(response)

Compared to the base model

The fine-tuned model differs from the base model in several ways:

Patch format: produces realistic unified diffs with git index hashes, proper context lines, and accurate line numbers, rather than simplified demonstration diffs
Code understanding: when asked to describe a diff, it produces concise commit-message-style summaries instead of multi-paragraph explanations
Rust idioms: generates patches using patterns commonly found in production Rust codebases (builder patterns, proper error types, doc comment updates)
Scope awareness: patches include related changes like import updates, doc example fixes, and test adjustments that a real commit would contain

Limitations

Trained with max sequence length of 512 tokens. Patches longer than ~400 tokens of code may be truncated or incomplete.
The model was trained on commit-level diffs, so it works best for focused, single-purpose changes rather than large refactors spanning many files.
As with any code generation model, outputs should be reviewed before use. The model may produce syntactically valid but semantically incorrect code.

Downloads last month: 1,846

Safetensors

Model size

35B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for jedisct1/Qwen3.6-35B-rust.mlx

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

Brooooooklyn/Qwen3.6-35B-A3B-UD-Q8_K_XL-mlx

Adapter

(1)

this model

jedisct1
/

Qwen3.6-35B-rust.mlx