cohere-transcribe-03-2026-mlx-8bit

Quantized MLX weights for beshkenadze/cohere-transcribe-03-2026-mlx-fp16.

Variant

  • Precision: 8-bit
  • Quantization mode: affine
  • Group size: 64

Files

  • model.safetensors
  • config.json
  • tokenizer.model
  • tokenizer_config.json
  • preprocessor_config.json
  • special_tokens_map.json
  • key_map.json
  • conversion_summary.json

Repo-sample benchmark

Sample: Tests/media/conversational_a.wav

  • Generation TPS: 352.9
  • Peak memory: 2.87 GB
  • Output: Coffee's story likely begins in Ethiopia, where legend tells of a goat herder named Kaldi, who noticed his goats became energetic after eating red berries from a particular bush; curious, he tried them himself and felt invigorated.

Parity note

This checkpoint has been re-validated against the current Swift and Python MLX runtimes.

Verified semantic parity on an English fixture:

This is a test recording in English. I am speaking clearly at a normal speed. Please transcribe this sentence exactly as I said.

Matched across:

  • Swift MLX fp16
  • Swift MLX 8-bit
  • Python MLX fp16
  • Python MLX 8-bit
  • official CUDA reference path (transformers native Cohere ASR)

Quality note

Matches fp16 on the repo sample while reducing memory substantially.

Notes

  • Generated from the Swift-compatible fp16 checkpoint beshkenadze/cohere-transcribe-03-2026-mlx-fp16.
  • This repository contains inference artifacts only. Refer to the upstream Cohere model card and license for original model details.
Downloads last month
138
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for beshkenadze/cohere-transcribe-03-2026-mlx-8bit

Quantized
(24)
this model