Gemma 4 31B IT โ Route B per-head+k37
This repository contains the Route B release built on top of
google/gemma-4-31B-it.
What it is
Route B is a packed ternary weight representation for large language
models. In this release, the strongest current long-context checkpoint
uses per-head prosody in attention and a one-layer keep override for
layers.37.k_proj.
Contents
- packed Route checkpoint or materialized HF export;
- minimal runtime under
runtime/; - minimal conversion/publishing entrypoints under
kernel/; - short architecture note in
kernel/ARCHITECTURE.md.
How to load
If this repo contains the packed .pt checkpoint, use the Route B
runtime in runtime/load_model.py.
If this repo contains a materialized Hugging Face export, load it as a
standard transformers model.
Important notes
- This is a derived release based on
google/gemma-4-31B-it. - Redistribution requires compliance with the Gemma license.
- The strongest settled quality numbers are in the accompanying paper
and
archiv.org/benchmarks/results.md.
Reference
Arman Aubakirov. A Discrete Weight Language for Large Language Models: Compressing Gemma 4 31B with a 5-Step Ternary Route. Technical report, April 2026.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for armanibadboy/gemma-4-31b-it-route-b-perhead-k37
Base model
google/gemma-4-31B-it