cohere-transcribe-03-2026-mlx-8bit
Quantized MLX weights for beshkenadze/cohere-transcribe-03-2026-mlx-fp16.
Variant
- Precision: 8-bit
- Quantization mode:
affine - Group size:
64
Files
model.safetensorsconfig.jsontokenizer.modeltokenizer_config.jsonpreprocessor_config.jsonspecial_tokens_map.jsonkey_map.jsonconversion_summary.json
Repo-sample benchmark
Sample: Tests/media/conversational_a.wav
- Generation TPS: 352.9
- Peak memory: 2.87 GB
- Output:
Coffee's story likely begins in Ethiopia, where legend tells of a goat herder named Kaldi, who noticed his goats became energetic after eating red berries from a particular bush; curious, he tried them himself and felt invigorated.
Parity note
This checkpoint has been re-validated against the current Swift and Python MLX runtimes.
Verified semantic parity on an English fixture:
This is a test recording in English. I am speaking clearly at a normal speed. Please transcribe this sentence exactly as I said.
Matched across:
- Swift MLX fp16
- Swift MLX 8-bit
- Python MLX fp16
- Python MLX 8-bit
- official CUDA reference path (
transformersnative Cohere ASR)
Quality note
Matches fp16 on the repo sample while reducing memory substantially.
Notes
- Generated from the Swift-compatible fp16 checkpoint
beshkenadze/cohere-transcribe-03-2026-mlx-fp16. - This repository contains inference artifacts only. Refer to the upstream Cohere model card and license for original model details.
- Downloads last month
- 138
Hardware compatibility
Log In to add your hardware
8-bit
Model tree for beshkenadze/cohere-transcribe-03-2026-mlx-8bit
Base model
CohereLabs/cohere-transcribe-03-2026