Instructions to use OrionLLM/NanoCoder-0.6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OrionLLM/NanoCoder-0.6b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OrionLLM/NanoCoder-0.6b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("OrionLLM/NanoCoder-0.6b") model = AutoModelForCausalLM.from_pretrained("OrionLLM/NanoCoder-0.6b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use OrionLLM/NanoCoder-0.6b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OrionLLM/NanoCoder-0.6b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionLLM/NanoCoder-0.6b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/OrionLLM/NanoCoder-0.6b
- SGLang
How to use OrionLLM/NanoCoder-0.6b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OrionLLM/NanoCoder-0.6b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionLLM/NanoCoder-0.6b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OrionLLM/NanoCoder-0.6b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionLLM/NanoCoder-0.6b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use OrionLLM/NanoCoder-0.6b with Docker Model Runner:
docker model run hf.co/OrionLLM/NanoCoder-0.6b
A compact 0.6B coding model built for strong reasoning efficiency.
NanoCoder is a small 0.6B parameter coding-focused language model designed for high and xhigh chronological reasoning in programming tasks.
It is built to deliver surprisingly strong structured reasoning and coding performance for its size, focusing on consistency, logical step progression, and efficient problem solving.
While NanoCoder is not intended to be a general everyday assistant, it is a small but capable specialist model that performs well within its class and remains reliable for compact code reasoning workloads.
Key Characteristics
- 0.6B parameters
- Dedicated to code
- Optimized for high reasoning intensity
- Chronological reasoning style
- Strong consistency for a compact model
- Designed for efficient performance despite its small size
Limitations
NanoCoder is a small specialized model.
Because of that:
- It may not match larger models on broad real-world assistant tasks
- It is not primarily designed for daily casual use
- It performs best when used for focused coding and reasoning workloads
- Its main strength is efficiency, consistency, and reasoning quality relative to size
- Downloads last month
- 19