Model Card for Whisper Darija (Fine-Tuned)
This is a fine-tuned OpenAI Whisper small model on Moroccan Darija speech transcription. It is trained to transcribe Moroccan dialectal Arabic from audio.
Model Details
Model Description
This model is a fine-tuned version of giannitto/whisper-morocco-model using a dataset of Moroccan Darija audio and transcriptions. The fine-tuning process aimed to improve the model's Word Error Rate (WER) for spoken Darija, which is underrepresented in many multilingual speech models.
- Developed by: Bentaleb Ali
- Model type: Automatic Speech Recognition (ASR)
- Language(s): Moroccan Darija (Arabic dialect)
- License: Apache 2.0
- Finetuned from model: giannitto/whisper-morocco-model
Model Sources
Uses
Direct Use
This model is intended for transcription of Moroccan Darija audio into text. It can be used in:
- Voice assistants
- Media subtitling
- Dialectal speech processing
- Linguistic research
Out-of-Scope Use
- Translation tasks (this model is for transcription, not translation)
- Other Arabic dialects outside Moroccan Darija
Bias, Risks, and Limitations
- The model may perform poorly on noisy or low-quality recordings.
- The model may not generalize well to other dialects of Arabic.
- Biases in the training data (e.g., gender, age, region) may affect transcription accuracy.
Recommendations
Carefully evaluate outputs when using the model in sensitive applications. Avoid using it in high-risk domains without human verification.
How to Get Started with the Model
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch, torchaudio
# Load model and processor
processor = AutoProcessor.from_pretrained("TaloCreations/whisper-darija-finetuned")
model = AutoModelForSpeechSeq2Seq.from_pretrained("TaloCreations/whisper-darija-finetuned")
model.eval()
speech, sr = torchaudio.load("path_to_record.wav")
if sr != 16000:
resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)
speech = resampler(speech)
# Preprocess and generate
inputs = processor(speech[0], sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
generated_ids = model.generate(**inputs)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("π’ Transcription:", transcription)
Training Details
Training Data
The model was trained on:
These datasets contain manually transcribed audio samples of Moroccan Darija.
Training Procedure
Preprocessing
- All audio was resampled to 16kHz
- Mel spectrograms were padded to 3000 frames (30s max)
- Transcripts were tokenized and clipped to <=448 tokens
- Decoder prompts were injected to ensure language/task alignment
Training Hyperparameters
- Batch size: 8 (gradient accumulation = 2)
- Epochs: 10
- Learning rate: 2e-6
- Mixed precision: fp16
- Weight decay: 0.01
- Warmup steps: 500
Evaluation
Testing Data, Factors & Metrics
Testing Data
A held-out subset (10%) of the training datasets.
Metrics
- Word Error Rate (WER)
Results
π Training Progress
| Epoch | Training Loss | Validation Loss | Word Error Rate (WER) |
|---|---|---|---|
| 1 | 0.905000 | 0.831409 | 0.825147 |
| 2 | 0.773200 | 0.712022 | 0.732625 |
| 3 | 0.658900 | 0.652096 | 0.631158 |
| 4 | 0.609100 | 0.608619 | 0.578152 |
| 5 | 0.548400 | 0.579711 | 0.546444 |
| 6 | 0.509700 | 0.561768 | 0.524927 |
| 7 | 0.482000 | 0.551717 | 0.522067 |
| 8 | 0.459400 | 0.545695 | 0.526979 |
| 9 | 0.446500 | 0.543017 | 0.497141 |
| 10 | 0.443200 | 0.542152 | 0.504545 |
Summary
After 10 epochs, the model achieved a WER of ~50%, a significant improvement over baseline multilingual Whisper models on Moroccan Darija.
Environmental Impact
Estimated based on training on a single A100 GPU for ~6.5 hours.
- Hardware Type: A100
- Hours used: ~6.5
- Cloud Provider: Google Cloud (Colab)
- Compute Region: Morocco
Technical Specifications
Model Architecture and Objective
- Whisper (small) encoder-decoder architecture
- Objective: sequence-to-sequence transcription
Compute Infrastructure
- Google Colab Pro
- 1x A100 GPU
- PyTorch + Transformers 4.39
Citation
title={Whisper Darija: Fine-tuned Whisper Model for Moroccan Arabic Speech},
author={Bentaleb, Ali},
year={2025},
}
Model Card Authors
- Ali Bentaleb @TaloCreations
Model Card Contact
- Downloads last month
- 261