Qwen3-ASR Uzbek (v2)

Fine-tuned Qwen/Qwen3-ASR-1.7B for Uzbek speech-to-text.

Model Details

  • Base model: Qwen/Qwen3-ASR-1.7B
  • Language: Uzbek (uz)
  • Training data: ~160K samples from FLEURS + USC + YouTube (IT, News, Podcasts)
  • Training: 5 epochs, lr=1e-5, continues from v1 checkpoint-2433

Usage

import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "Gearnode/qwen3-asr-uzbek-v2",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    max_new_tokens=448,
    forced_aligner="Qwen/Qwen3-ForcedAligner-0.6B",
    forced_aligner_kwargs=dict(dtype=torch.bfloat16, device_map="cuda:0"),
)

results = model.transcribe(
    audio=[(audio_array, 16000)],
    language=["Uzbek"],
    return_time_stamps=True,
)
print(results[0].text)

Training Datasets

  • Google FLEURS uz_uz (2,943 samples)
  • Uzbek Speech Corpus (100,767 samples)
  • YouTube IT Uzbek (21,016 samples)
  • YouTube News Uzbek (20,795 samples)
  • YouTube Podcasts Uzbek (14,547 samples)

Evaluation

Model Uzbek Quality Notes
This model Best Some repetition handled by post-processing
Meta MMS (mms-1b-all) Passable Use lang code uzb-script_latin
Whisper large-v3 Poor Hallucination loops on Uzbek
Base Qwen3-ASR Poor Mixed languages

License

Apache 2.0 (same as base model)

Downloads last month
153
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Gearnode/qwen3-asr-uzbek-v2

Finetuned
(36)
this model