parakeet-tdt-0.6b-v3-basque — ONNX Export

ONNX export of itzune/parakeet-tdt-0.6b-v3-basque, a Basque ASR model fine-tuned from nvidia/parakeet-tdt-0.6b-v3 (FastConformer-TDT, 600M parameters).

For NeMo-native inference, use the .nemo model at itzune/parakeet-tdt-0.6b-v3-basque instead.


Evaluation results

Test split WER (%) Description
test_cv 6.92 Mozilla Common Voice Basque
test_parl 4.36 Basque Parliament recordings
test_oslr 14.52 OpenSLR Basque corpus

Baseline (no fine-tuning): >100% WER on all splits — Basque is not a supported language in the original parakeet-tdt-0.6b-v3.


Repository contents

This repo contains the ONNX export of the model. Because of the RNNT/TDT architecture and the ONNX protobuf 2 GB size limit, the export consists of multiple files:

encoder-parakeet-tdt-0.6b-v3-basque.onnx   # Encoder graph (~2 MB, references external data)
decoder_joint-parakeet-tdt-0.6b-v3-basque.onnx  # Decoder+Joint graph (~70 MB, references external data)
onnx__MatMul_*   # External tensor weight files (~2.4 GB total, ~291 files)
onnx_export_files.txt   # Manifest listing all files

All files must be present together — the .onnx graphs reference the external weight files by relative path. You cannot use the .onnx files alone.

Why two ONNX graphs?

The RNNT/TDT architecture has two distinct networks with incompatible execution patterns:

  • Encoder: runs once over the full audio spectrogram to produce hidden states.
  • Decoder + Joint: runs iteratively, one step per output token, combining encoder hidden states with the previous predicted token.

These cannot be merged into a single ONNX graph. Any RNNT-capable ONNX inference runtime (NVIDIA Triton, ONNX Runtime with custom RNNT decoding) expects this two-graph layout.


Usage with ONNX Runtime

Note: Running RNNT/TDT decoding manually with ONNX Runtime requires implementing the token-by-token beam/greedy search loop yourself. For most use cases, using the NeMo model directly is simpler.

1. Clone the repo and install dependencies

git clone https://huggingface.co/itzune/parakeet-tdt-0.6b-v3-basque-ONNX
pip install onnxruntime-gpu numpy soundfile

2. Load the encoder

import onnxruntime as ort

encoder_session = ort.InferenceSession(
    "encoder-parakeet-tdt-0.6b-v3-basque.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

decoder_session = ort.InferenceSession(
    "decoder_joint-parakeet-tdt-0.6b-v3-basque.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

The external weight files must be in the same directory as the .onnx files when loading.

3. Preprocessing

The model expects 80-dimensional log-mel spectrograms, matching NeMo's AudioToMelSpectrogramPreprocessor:

  • Sample rate: 16 kHz
  • Window size: 25 ms (400 samples)
  • Window stride: 10 ms (160 samples)
  • Mel bins: 80
  • Normalization: per-feature mean/std (as computed by NeMo preprocessor)

Recommended: use the NeMo model

For production inference without having to reimplement the RNNT decoding loop, use the .nemo model with NeMo or the nemo_toolkit Python package:

pip install nemo_toolkit[asr]
import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.from_pretrained("itzune/parakeet-tdt-0.6b-v3-basque")
transcriptions = model.transcribe(["audio.wav"])
print(transcriptions[0])

See itzune/parakeet-tdt-0.6b-v3-basque for full usage instructions, training details, and dataset information.


Model details

Property Value
Base model nvidia/parakeet-tdt-0.6b-v3
Architecture FastConformer-TDT (Token-and-Duration Transducer)
Parameters ~600M
Language Basque (eu)
Sample rate 16 kHz
Training data asierhv/composite_corpus_eu_v2.1 (~676 h)
Fine-tuned with NVIDIA NeMo 2.7
NeMo model itzune/parakeet-tdt-0.6b-v3-basque
Training repo xezpeleta/parakeet-tdt-0.6b-v3-basque

License

This model is released under CC BY 4.0, inheriting the license of the base model weights.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for itzune/parakeet-tdt-0.6b-v3-basque-ONNX

Quantized
(23)
this model

Evaluation results

  • WER (%) on composite_corpus_eu_v2.1 — test_cv
    self-reported
    6.920
  • WER (%) on composite_corpus_eu_v2.1 — test_parl
    self-reported
    4.360
  • WER (%) on composite_corpus_eu_v2.1 — test_oslr
    self-reported
    14.520