Qwen2.5-VL-7B Arabic VQA

Fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct on Arabic Visual Question Answering data as part of the ArabicVL-R project.

Model Description

This model is fine-tuned to answer visual questions in Arabic, supporting reasoning over images with Arabic text responses.

Training Data

Dataset: Arabic LLaVA — available at ArabicVL-R
Total samples: 5,000
Split: 80% train / 10% validation / 10% test
Train: 4,000 samples
Validation: 500 samples
Test: 500 samples

Evaluation Results

Evaluated on the test split (503 samples) using Exact Match:

Metric	Score
Exact Match	0.4334
Accuracy	43.34%
Correct / Total	218 / 503

Usage

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from PIL import Image
import torch

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Manar01/qwen-vl-arabic",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Manar01/qwen-vl-arabic")

image = Image.open("image.jpg").convert("RGB")

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "your question in Arabic"},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=256, do_sample=False)

generated = output[0][inputs["input_ids"].shape[1]:]
print(processor.decode(generated, skip_special_tokens=True))

Authors

Sarah Aldumiji
Manar Alrabie
Hadeel Alseni
Ragheed Samkari
Mourad Mars

Citation

@misc{arabicvlr2026,
  title  = {ArabicVL-R: Arabic Vision Language Model Reasoning},
  author = {Sarah and Manar and Hadeel and Ragheed and Mourad},
  year   = {2025},
  url    = {https://github.com/hadeelalseni/ArabicVL-R}
}

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ManarAlrabie/qwen-vl-arabic

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(1041)

this model