Model Card for Whisper Darija (Fine-Tuned)

This is a fine-tuned OpenAI Whisper small model on Moroccan Darija speech transcription. It is trained to transcribe Moroccan dialectal Arabic from audio.

Model Details

Model Description

This model is a fine-tuned version of giannitto/whisper-morocco-model using a dataset of Moroccan Darija audio and transcriptions. The fine-tuning process aimed to improve the model's Word Error Rate (WER) for spoken Darija, which is underrepresented in many multilingual speech models.

  • Developed by: Bentaleb Ali
  • Model type: Automatic Speech Recognition (ASR)
  • Language(s): Moroccan Darija (Arabic dialect)
  • License: Apache 2.0
  • Finetuned from model: giannitto/whisper-morocco-model

Model Sources

Uses

Direct Use

This model is intended for transcription of Moroccan Darija audio into text. It can be used in:

  • Voice assistants
  • Media subtitling
  • Dialectal speech processing
  • Linguistic research

Out-of-Scope Use

  • Translation tasks (this model is for transcription, not translation)
  • Other Arabic dialects outside Moroccan Darija

Bias, Risks, and Limitations

  • The model may perform poorly on noisy or low-quality recordings.
  • The model may not generalize well to other dialects of Arabic.
  • Biases in the training data (e.g., gender, age, region) may affect transcription accuracy.

Recommendations

Carefully evaluate outputs when using the model in sensitive applications. Avoid using it in high-risk domains without human verification.

How to Get Started with the Model

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch, torchaudio

# Load model and processor
processor = AutoProcessor.from_pretrained("TaloCreations/whisper-darija-finetuned")
model = AutoModelForSpeechSeq2Seq.from_pretrained("TaloCreations/whisper-darija-finetuned")
model.eval()

speech, sr = torchaudio.load("path_to_record.wav")

if sr != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)
    speech = resampler(speech)

# Preprocess and generate
inputs = processor(speech[0], sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    generated_ids = model.generate(**inputs)
    transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("πŸ“’ Transcription:", transcription)

Training Details

Training Data

The model was trained on:

These datasets contain manually transcribed audio samples of Moroccan Darija.

Training Procedure

Preprocessing

  • All audio was resampled to 16kHz
  • Mel spectrograms were padded to 3000 frames (30s max)
  • Transcripts were tokenized and clipped to <=448 tokens
  • Decoder prompts were injected to ensure language/task alignment

Training Hyperparameters

  • Batch size: 8 (gradient accumulation = 2)
  • Epochs: 10
  • Learning rate: 2e-6
  • Mixed precision: fp16
  • Weight decay: 0.01
  • Warmup steps: 500

Evaluation

Testing Data, Factors & Metrics

Testing Data

A held-out subset (10%) of the training datasets.

Metrics

  • Word Error Rate (WER)

Results

πŸ“Š Training Progress

Epoch Training Loss Validation Loss Word Error Rate (WER)
1 0.905000 0.831409 0.825147
2 0.773200 0.712022 0.732625
3 0.658900 0.652096 0.631158
4 0.609100 0.608619 0.578152
5 0.548400 0.579711 0.546444
6 0.509700 0.561768 0.524927
7 0.482000 0.551717 0.522067
8 0.459400 0.545695 0.526979
9 0.446500 0.543017 0.497141
10 0.443200 0.542152 0.504545

Summary

After 10 epochs, the model achieved a WER of ~50%, a significant improvement over baseline multilingual Whisper models on Moroccan Darija.

Environmental Impact

Estimated based on training on a single A100 GPU for ~6.5 hours.

  • Hardware Type: A100
  • Hours used: ~6.5
  • Cloud Provider: Google Cloud (Colab)
  • Compute Region: Morocco

Technical Specifications

Model Architecture and Objective

  • Whisper (small) encoder-decoder architecture
  • Objective: sequence-to-sequence transcription

Compute Infrastructure

  • Google Colab Pro
  • 1x A100 GPU
  • PyTorch + Transformers 4.39

Citation

  title={Whisper Darija: Fine-tuned Whisper Model for Moroccan Arabic Speech},
  author={Bentaleb, Ali},
  year={2025},
}

Model Card Authors

Model Card Contact

Downloads last month
261
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using TaloCreations/whisper-darija-finetuned 1