Qwen3.6-35B-A3B Rust Code Fine-Tune (MLX)

A fine-tuned version of Qwen3.6-35B-A3B optimized for Rust code generation and comprehension.

The base model is a Mixture-of-Experts architecture with 35B total parameters and 3B active per token (256 experts, 8 active). Quantized to 8-bit.

This model was tested with Swival on real-world Rust code.

Training

Fine-tuned on the jedisct1/rust dataset, which contains 356K commits from popular Rust repositories including diffs and commit messages.

Training was bidirectional — the model learned in two complementary directions:

  • Forward (instruction to code): given a description of a change, generate the corresponding unified diff
  • Reverse (code to instruction): given a diff, produce a concise description of what it does

This approach teaches both code generation and code understanding simultaneously. After training, the model produces outputs that resemble real commits from production Rust projects rather than textbook examples.

Configuration

  • Method: LoRA (rank 8, alpha 16) applied to all 40 layers
  • Target modules: q/k/v/o projections + gate/up/down MLP projections (including MoE expert layers)
  • Dataset: 634K training samples, 33K evaluation samples
  • Iterations: 1000 steps, batch size 1, gradient accumulation 4
  • Learning rate: 2e-5 with cosine schedule
  • Sequence length: 512 tokens
  • Hardware: Apple Silicon (M-series), ~50 GB unified memory

Results

Metric Start End
Training loss 1.92 0.69
Validation loss 1.66 0.74

Usage

Requires mlx-lm:

pip install mlx-lm

Generate a patch from a description

from mlx_lm import load, generate

model, tokenizer = load("jedisct1/Qwen3.6-35B-rust.mlx")

messages = [
    {"role": "system", "content": "You are an expert Rust developer."},
    {"role": "user", "content": "Generate a code patch for the following change:\n\nReplace all .unwrap() calls in parse_config() with proper ? operator error propagation"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(response)

Describe a code change

messages = [
    {"role": "system", "content": "You are an expert Rust developer."},
    {"role": "user", "content": "Describe what the following Rust code change does:\n\n```diff\n<your diff here>\n```"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
print(response)

Compared to the base model

The fine-tuned model differs from the base model in several ways:

  • Patch format: produces realistic unified diffs with git index hashes, proper context lines, and accurate line numbers, rather than simplified demonstration diffs
  • Code understanding: when asked to describe a diff, it produces concise commit-message-style summaries instead of multi-paragraph explanations
  • Rust idioms: generates patches using patterns commonly found in production Rust codebases (builder patterns, proper error types, doc comment updates)
  • Scope awareness: patches include related changes like import updates, doc example fixes, and test adjustments that a real commit would contain

Limitations

  • Trained with max sequence length of 512 tokens. Patches longer than ~400 tokens of code may be truncated or incomplete.
  • The model was trained on commit-level diffs, so it works best for focused, single-purpose changes rather than large refactors spanning many files.
  • As with any code generation model, outputs should be reviewed before use. The model may produce syntactically valid but semantically incorrect code.
Downloads last month
1,846
Safetensors
Model size
35B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jedisct1/Qwen3.6-35B-rust.mlx

Adapter
(1)
this model

Dataset used to train jedisct1/Qwen3.6-35B-rust.mlx