kimodo-api πŸƒ

A REST API wrapper around NVIDIA Kimodo β€” the state-of-the-art text-to-motion diffusion model trained on 700 hours of commercial mocap data.

This image turns Kimodo into a microservice you can call from any pipeline, no Python environment needed.

Quick Start

docker pull ghcr.io/eyalenav/kimodo-api:latest

docker run --rm --gpus '"device=0"' -p 9551:9551 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e HUGGINGFACE_TOKEN=hf_... \
  ghcr.io/eyalenav/kimodo-api:latest

⚠️ First run downloads Llama-3-8B-Instruct (~16GB) for the text encoder. Requires a HuggingFace token with access to meta-llama/Meta-Llama-3-8B-Instruct.

API

POST /generate

Generate a motion clip from a text prompt.

curl -X POST http://localhost:9551/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "person pushing through a crowd aggressively"}'

Response: NPZ file (binary) β€” SOMA 77-joint skeleton format, compatible with BVH export.

GET /health

curl http://localhost:9551/health
# {"status": "ok"}

Requirements

Resource Minimum
GPU RTX 3090 / A100 / RTX 6000 Ada
VRAM 24 GB
RAM 32 GB
Disk 50 GB (model weights)

What's inside

  • Kimodo β€” NVIDIA's kinematic motion diffusion model (77-joint SOMA skeleton)
  • LLM2Vec text encoder backed by Llama-3-8B-Instruct
  • FastAPI server on port 9551
  • Health check + graceful startup

Part of VisionAI-Flywheel

This service is one component of a full synthetic surveillance data pipeline:

[kimodo-api] β†’ NPZ motion
    ↓
[render-api] β†’ SOMA mesh render (MP4)
    ↓
[cosmos-transfer] β†’ Sim2Real photorealistic video
    ↓
[NVIDIA VSS] β†’ VLM annotation β†’ fine-tuning dataset

πŸ”— Full pipeline: github.com/EyalEnav/VisionAI-Flywheel

License

Apache 2.0 β€” see LICENSE

Kimodo model weights are released under the NVIDIA Open Model License and downloaded at runtime. They are not bundled in this image.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support