Gemma 4 31B IT โ€” Route B per-head+k37

This repository contains the Route B release built on top of google/gemma-4-31B-it.

What it is

Route B is a packed ternary weight representation for large language models. In this release, the strongest current long-context checkpoint uses per-head prosody in attention and a one-layer keep override for layers.37.k_proj.

Contents

  • packed Route checkpoint or materialized HF export;
  • minimal runtime under runtime/;
  • minimal conversion/publishing entrypoints under kernel/;
  • short architecture note in kernel/ARCHITECTURE.md.

How to load

If this repo contains the packed .pt checkpoint, use the Route B runtime in runtime/load_model.py.

If this repo contains a materialized Hugging Face export, load it as a standard transformers model.

Important notes

  • This is a derived release based on google/gemma-4-31B-it.
  • Redistribution requires compliance with the Gemma license.
  • The strongest settled quality numbers are in the accompanying paper and archiv.org/benchmarks/results.md.

Reference

Arman Aubakirov. A Discrete Weight Language for Large Language Models: Compressing Gemma 4 31B with a 5-Step Ternary Route. Technical report, April 2026.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for armanibadboy/gemma-4-31b-it-route-b-perhead-k37

Finetuned
(100)
this model