Qwen2.5-VL-7B Arabic VQA

Fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct on Arabic Visual Question Answering data as part of the ArabicVL-R project.

Model Description

This model is fine-tuned to answer visual questions in Arabic, supporting reasoning over images with Arabic text responses.

Training Data

  • Dataset: Arabic LLaVA — available at ArabicVL-R
  • Total samples: 5,000
  • Split: 80% train / 10% validation / 10% test
  • Train: 4,000 samples
  • Validation: 500 samples
  • Test: 500 samples

Evaluation Results

Evaluated on the test split (503 samples) using Exact Match:

Metric Score
Exact Match 0.4334
Accuracy 43.34%
Correct / Total 218 / 503

Usage

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from PIL import Image
import torch

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Manar01/qwen-vl-arabic",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Manar01/qwen-vl-arabic")

image = Image.open("image.jpg").convert("RGB")

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "your question in Arabic"},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=256, do_sample=False)

generated = output[0][inputs["input_ids"].shape[1]:]
print(processor.decode(generated, skip_special_tokens=True))

Authors

  • Sarah Aldumiji
  • Manar Alrabie
  • Hadeel Alseni
  • Ragheed Samkari
  • Mourad Mars

Citation

@misc{arabicvlr2026,
  title  = {ArabicVL-R: Arabic Vision Language Model Reasoning},
  author = {Sarah and Manar and Hadeel and Ragheed and Mourad},
  year   = {2025},
  url    = {https://github.com/hadeelalseni/ArabicVL-R}
}

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ManarAlrabie/qwen-vl-arabic

Finetuned
(1041)
this model