🚁 Drone VLA Toolkit

Train a Vision-Language-Action model for drone navigation using Qwen3.5-0.8B + LoRA + Action Head.

Architecture: FPV Image + Text Instruction → Qwen3.5-0.8B (LoRA) → MLP → [Vx, Vy, Vz, Yaw_rate]

📁 Scripts

Script	What it does
`collect_data.py`	🎮 Captures FPV screenshots + keyboard inputs from your simulator
`convert_to_hf_dataset.py`	📦 Converts collected data → HuggingFace Dataset format
`generate_synthetic_data.py`	🎲 Generates fake data for testing the pipeline
`train_drone_vla.py`	🏋️ Trains Qwen3.5-0.8B + LoRA + action head
`inference_drone_vla.py`	🎯 Runs inference (single image or live from simulator)

🚀 Quick Start

Step 1: Install dependencies

pip install pynput mss pillow numpy datasets torch transformers>=4.57 peft>=0.14 accelerate trackio huggingface_hub qwen-vl-utils

Step 2: Collect data from your simulator

# Open your simulator window, then run:
python collect_data.py --output_dir ./drone_data --fps 10 --embodiment drone

# Controls:
#   R = Start/Stop recording episode
#   W/S/A/D = Forward/Back/Left/Right
#   Space/Shift = Up/Down
#   Q/E = Yaw left/right
#   ESC = Quit and save

Step 3: Convert to HF Dataset

python convert_to_hf_dataset.py \
    --source collected \
    --input_dir ./drone_data \
    --output_repo YOUR_USER/drone-nav-data \
    --push_to_hub

Step 4: Train

python train_drone_vla.py \
    --dataset_repo YOUR_USER/drone-nav-data \
    --hub_model_id YOUR_USER/drone-vla-qwen3.5-0.8b \
    --num_epochs 10 \
    --batch_size 4

Step 5: Inference

python inference_drone_vla.py \
    --model_repo YOUR_USER/drone-vla-qwen3.5-0.8b \
    --image frame.jpg \
    --instruction "fly forward through the gate"

🎮 Action Space

Maps directly to a standard RC drone controller:

Left Stick                Right Stick
    ↑ Vz (up)                ↑ Vx (forward)
    ↓ Vz (down)              ↓ Vx (backward)
    ← Yaw (rotate left)      ← Vy (strafe left)
    → Yaw (rotate right)     → Vy (strafe right)

All values normalized to [-1, 1].

🤖 Multi-Embodiment

Same action space works across robots:

Drone: [Vx, Vy, Vz, Yaw] — all 4 active
Car: [Vx, 0, 0, Yaw] — no strafe, no altitude
Dog: [Vx, Vy, 0, Yaw] — no altitude

📚 Based On

RaceVLA — LoRA + drone velocity action space
OpenVLA-OFT — L1 regression >> discrete tokens
SmolVLA — Small VLA architecture patterns

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for Sherlockhu/drone-vla-toolkit