dispatchAI SDK

Small. Mobile. Free. UAE-built.

pip install dispatchai — Run mobile-optimized LLMs on your phone, edge device, or laptop. 31 verified models, all tested on real Snapdragon hardware, all free.

Quick Start

pip install dispatchai[gguf]

Chat with a model

from dispatchai import load_model

model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
response = model.chat("What is the capital of France?")
print(response)
# → "The capital of France is Paris."

🌐 Inference API

Use dispatchAI models via REST API (OpenAI-compatible):

import openai

client = openai.OpenAI(
    base_url="https://api.dispatchai.ai/v1",
    api_key="da-demo-key-0001"
)

response = client.chat.completions.create(
    model="dispatchAI/SmolLM2-135M-Instruct-mobile",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(response.choices[0].message.content)
# → "The capital of France is Paris."

Pricing: $0.001/1K input tokens, $0.002/1K output tokens (10x cheaper than OpenAI)

Endpoint: https://api.dispatchai.ai/v1

Available Models:

dispatchAI/SmolLM2-135M-Instruct-mobile (101MB, 46 t/s on phone)
dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 (469MB, 23 t/s on phone)
dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile (770MB, 5.4 t/s on phone)

Local Inference

Find the best model for your phone

from dispatchai import recommend

rec = recommend(ram_mb=2048, task="chat")
print(f"Best model: {rec['recommended']['name']}")

List all models

from dispatchai import list_models

for m in list_models(task="chat"):
    print(f"  {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s")

Estimate latency

from dispatchai import estimate_latency

lat = estimate_latency("1B", "Q4_K_M")
print(f"{lat['tokens_per_sec']} t/s on Snapdragon 865")

Calculate cost savings

from dispatchai import calculate_cost

result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50)
print(f"Annual savings: ${result['savings']}")

Installation Options

pip install dispatchai                    # Core (model catalog, recommendations)
pip install dispatchai[torch]             # + transformers/torch backend
pip install dispatchai[gguf]              # + llama.cpp GGUF backend
pip install dispatchai[full]              # + everything

Verified Models (June 2026)

✅ 31 models fully working (0 broken, 0 partial)
📱 24 models phone-verified on Snapdragon 865
All have correct chat formats documented

Top 3 Models

Model	Size	Phone Speed	Use Case
SmolLM2-135M	101MB	46.0 t/s	Ultra-fast, budget phones
Qwen2.5-0.5B-int4	469MB	23.2 t/s	Best balance for mobile
Llama-3.2-1B-Q4	770MB	5.4 t/s	Best quality under 1GB

About

Dispatch AI (FZE) — Sharjah Free Zone, UAE. License No. 10818.

🌐 dispatchai.ai | 🤗 huggingface.co/dispatchAI | API: api.dispatchai.ai

I think, therefore I ship.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

dispatchAI
/

dispatchAI-SDK