Step-3.7-Flash-JANG_K

JANG affine conversion of stepfun-ai/Step-3.7-Flash-NVFP4.

This JANG_K variant keeps the proven Step JANG text runtime path and uses the routed expert policy:

gate_proj / up_proj / down_proj = 4 / 2 / 2

It is the affine K-lane comparison point for the experimental Step JANGTQ_2K work.

Status

Verified locally:

  • 58 safetensors shards
  • 2,570 indexed tensors
  • no raw NVFP4 weight_scale, weight_scale_2, or input_scale sidecars in the output index
  • jang_config.json capability verification passes
  • text generation proof passes through the bundled step3p7_mlx.py bridge

Text proof:

{
  "prompt": "What is 2+2? Answer with only the number.",
  "output": "The user is asking \"What is 2+2? Answer with only the number.\" So the answer is 4. The user wants only the number, so I should just output \"4\".\\n</think>\\n4",
  "prompt_tokens": 26,
  "generated_tokens": 43,
  "contains_final_4": true
}

Warmed decode proof:

{
  "measured_tokens": 32,
  "decode_s": 0.8008251190185547,
  "tok_s": 39.95878655656726
}

Format

  • Format: JANG affine
  • Profile: JANG_K
  • Routed expert policy: gate_proj=4, up_proj=2, down_proj=2
  • Attention, router gates, dense/shared MLP, embeddings, and lm head follow the proven Step JANG_2L runtime policy
  • Vision/projector tensors are included as F16 passthrough
  • Audio tensors: none in the source checkpoint
  • MTP tensors: none in the source checkpoint

Runtime

The bundled step3p7_mlx.py bridge maps the nested Step3p7 text config to MLX's Step3p5 text runtime and drops vision tensors for text-only generation.

Required text runtime behavior:

  • load model_file=step3p7_mlx.py
  • preserve the source chat template; it opens the assistant generation prompt inside <think>
  • use normal KV cache with Step full/sliding attention behavior from the Step3p5 MLX runtime
  • do not add a second synthetic reasoning prefix
  • use PreTrainedTokenizerFast; the source tokenizer metadata otherwise chooses a Llama tokenizer class that decodes byte-level markers incorrectly

Full image-input VLM coherence is not claimed by this artifact. The vision weights are present, but image patch expansion and projector routing still need a Step3p7 VLM wrapper in the target runtime.

Korean

이 번들은 Step-3.7-Flash-NVFP4를 JANG_K affine 4/2/2 전문가 비트 정책으로 변환한 산출물입니다. 텍스트 경로는 로컬 MLX 생성 검증을 통과했습니다. 비전 가중치는 포함되어 있지만 이미지 입력 경로는 별도 런타임 구현과 검증이 필요합니다.

Downloads last month
24
Safetensors
Model size
22B params
Tensor type
F32
·
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/Step-3.7-Flash-JANG_K

Quantized
(4)
this model