YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

๐Ÿš Drone VLA Toolkit

Train a Vision-Language-Action model for drone navigation using Qwen3.5-0.8B + LoRA + Action Head.

Architecture: FPV Image + Text Instruction โ†’ Qwen3.5-0.8B (LoRA) โ†’ MLP โ†’ [Vx, Vy, Vz, Yaw_rate]

๐Ÿ“ Scripts

Script What it does
collect_data.py ๐ŸŽฎ Captures FPV screenshots + keyboard inputs from your simulator
convert_to_hf_dataset.py ๐Ÿ“ฆ Converts collected data โ†’ HuggingFace Dataset format
generate_synthetic_data.py ๐ŸŽฒ Generates fake data for testing the pipeline
train_drone_vla.py ๐Ÿ‹๏ธ Trains Qwen3.5-0.8B + LoRA + action head
inference_drone_vla.py ๐ŸŽฏ Runs inference (single image or live from simulator)

๐Ÿš€ Quick Start

Step 1: Install dependencies

pip install pynput mss pillow numpy datasets torch transformers>=4.57 peft>=0.14 accelerate trackio huggingface_hub qwen-vl-utils

Step 2: Collect data from your simulator

# Open your simulator window, then run:
python collect_data.py --output_dir ./drone_data --fps 10 --embodiment drone

# Controls:
#   R = Start/Stop recording episode
#   W/S/A/D = Forward/Back/Left/Right
#   Space/Shift = Up/Down
#   Q/E = Yaw left/right
#   ESC = Quit and save

Step 3: Convert to HF Dataset

python convert_to_hf_dataset.py \
    --source collected \
    --input_dir ./drone_data \
    --output_repo YOUR_USER/drone-nav-data \
    --push_to_hub

Step 4: Train

python train_drone_vla.py \
    --dataset_repo YOUR_USER/drone-nav-data \
    --hub_model_id YOUR_USER/drone-vla-qwen3.5-0.8b \
    --num_epochs 10 \
    --batch_size 4

Step 5: Inference

python inference_drone_vla.py \
    --model_repo YOUR_USER/drone-vla-qwen3.5-0.8b \
    --image frame.jpg \
    --instruction "fly forward through the gate"

๐ŸŽฎ Action Space

Maps directly to a standard RC drone controller:

Left Stick                Right Stick
    โ†‘ Vz (up)                โ†‘ Vx (forward)
    โ†“ Vz (down)              โ†“ Vx (backward)
    โ† Yaw (rotate left)      โ† Vy (strafe left)
    โ†’ Yaw (rotate right)     โ†’ Vy (strafe right)

All values normalized to [-1, 1].

๐Ÿค– Multi-Embodiment

Same action space works across robots:

  • Drone: [Vx, Vy, Vz, Yaw] โ€” all 4 active
  • Car: [Vx, 0, 0, Yaw] โ€” no strafe, no altitude
  • Dog: [Vx, Vy, 0, Yaw] โ€” no altitude

๐Ÿ“š Based On

  • RaceVLA โ€” LoRA + drone velocity action space
  • OpenVLA-OFT โ€” L1 regression >> discrete tokens
  • SmolVLA โ€” Small VLA architecture patterns
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Papers for Sherlockhu/drone-vla-toolkit