parakeet-tdt-0.6b-v3-basque — ONNX Export
ONNX export of itzune/parakeet-tdt-0.6b-v3-basque, a Basque ASR model fine-tuned from nvidia/parakeet-tdt-0.6b-v3 (FastConformer-TDT, 600M parameters).
For NeMo-native inference, use the
.nemomodel at itzune/parakeet-tdt-0.6b-v3-basque instead.
Evaluation results
| Test split | WER (%) | Description |
|---|---|---|
test_cv |
6.92 | Mozilla Common Voice Basque |
test_parl |
4.36 | Basque Parliament recordings |
test_oslr |
14.52 | OpenSLR Basque corpus |
Baseline (no fine-tuning): >100% WER on all splits — Basque is not a supported language in the original parakeet-tdt-0.6b-v3.
Repository contents
This repo contains the ONNX export of the model. Because of the RNNT/TDT architecture and the ONNX protobuf 2 GB size limit, the export consists of multiple files:
encoder-parakeet-tdt-0.6b-v3-basque.onnx # Encoder graph (~2 MB, references external data)
decoder_joint-parakeet-tdt-0.6b-v3-basque.onnx # Decoder+Joint graph (~70 MB, references external data)
onnx__MatMul_* # External tensor weight files (~2.4 GB total, ~291 files)
onnx_export_files.txt # Manifest listing all files
All files must be present together — the .onnx graphs reference the external weight files by relative path. You cannot use the .onnx files alone.
Why two ONNX graphs?
The RNNT/TDT architecture has two distinct networks with incompatible execution patterns:
- Encoder: runs once over the full audio spectrogram to produce hidden states.
- Decoder + Joint: runs iteratively, one step per output token, combining encoder hidden states with the previous predicted token.
These cannot be merged into a single ONNX graph. Any RNNT-capable ONNX inference runtime (NVIDIA Triton, ONNX Runtime with custom RNNT decoding) expects this two-graph layout.
Usage with ONNX Runtime
Note: Running RNNT/TDT decoding manually with ONNX Runtime requires implementing the token-by-token beam/greedy search loop yourself. For most use cases, using the NeMo model directly is simpler.
1. Clone the repo and install dependencies
git clone https://huggingface.co/itzune/parakeet-tdt-0.6b-v3-basque-ONNX
pip install onnxruntime-gpu numpy soundfile
2. Load the encoder
import onnxruntime as ort
encoder_session = ort.InferenceSession(
"encoder-parakeet-tdt-0.6b-v3-basque.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)
decoder_session = ort.InferenceSession(
"decoder_joint-parakeet-tdt-0.6b-v3-basque.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)
The external weight files must be in the same directory as the
.onnxfiles when loading.
3. Preprocessing
The model expects 80-dimensional log-mel spectrograms, matching NeMo's AudioToMelSpectrogramPreprocessor:
- Sample rate: 16 kHz
- Window size: 25 ms (400 samples)
- Window stride: 10 ms (160 samples)
- Mel bins: 80
- Normalization: per-feature mean/std (as computed by NeMo preprocessor)
Recommended: use the NeMo model
For production inference without having to reimplement the RNNT decoding loop, use the .nemo model with NeMo or the nemo_toolkit Python package:
pip install nemo_toolkit[asr]
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained("itzune/parakeet-tdt-0.6b-v3-basque")
transcriptions = model.transcribe(["audio.wav"])
print(transcriptions[0])
See itzune/parakeet-tdt-0.6b-v3-basque for full usage instructions, training details, and dataset information.
Model details
| Property | Value |
|---|---|
| Base model | nvidia/parakeet-tdt-0.6b-v3 |
| Architecture | FastConformer-TDT (Token-and-Duration Transducer) |
| Parameters | ~600M |
| Language | Basque (eu) |
| Sample rate | 16 kHz |
| Training data | asierhv/composite_corpus_eu_v2.1 (~676 h) |
| Fine-tuned with | NVIDIA NeMo 2.7 |
| NeMo model | itzune/parakeet-tdt-0.6b-v3-basque |
| Training repo | xezpeleta/parakeet-tdt-0.6b-v3-basque |
License
This model is released under CC BY 4.0, inheriting the license of the base model weights.
- Downloads last month
- -
Model tree for itzune/parakeet-tdt-0.6b-v3-basque-ONNX
Base model
nvidia/parakeet-tdt-0.6b-v3Evaluation results
- WER (%) on composite_corpus_eu_v2.1 — test_cvself-reported6.920
- WER (%) on composite_corpus_eu_v2.1 — test_parlself-reported4.360
- WER (%) on composite_corpus_eu_v2.1 — test_oslrself-reported14.520