Instructions to use OrionLLM/NanoCoder-0.6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OrionLLM/NanoCoder-0.6b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="OrionLLM/NanoCoder-0.6b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OrionLLM/NanoCoder-0.6b")
model = AutoModelForCausalLM.from_pretrained("OrionLLM/NanoCoder-0.6b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use OrionLLM/NanoCoder-0.6b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OrionLLM/NanoCoder-0.6b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OrionLLM/NanoCoder-0.6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/OrionLLM/NanoCoder-0.6b

SGLang

How to use OrionLLM/NanoCoder-0.6b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OrionLLM/NanoCoder-0.6b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OrionLLM/NanoCoder-0.6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OrionLLM/NanoCoder-0.6b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OrionLLM/NanoCoder-0.6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use OrionLLM/NanoCoder-0.6b with Docker Model Runner:
```
docker model run hf.co/OrionLLM/NanoCoder-0.6b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

logo

A compact 0.6B coding model built for strong reasoning efficiency.

NanoCoder is a small 0.6B parameter coding-focused language model designed for high and xhigh chronological reasoning in programming tasks.

It is built to deliver surprisingly strong structured reasoning and coding performance for its size, focusing on consistency, logical step progression, and efficient problem solving.

While NanoCoder is not intended to be a general everyday assistant, it is a small but capable specialist model that performs well within its class and remains reliable for compact code reasoning workloads.

Key Characteristics

0.6B parameters
Dedicated to code
Optimized for high reasoning intensity
Chronological reasoning style
Strong consistency for a compact model
Designed for efficient performance despite its small size

Limitations

NanoCoder is a small specialized model.

Because of that:

It may not match larger models on broad real-world assistant tasks
It is not primarily designed for daily casual use
It performs best when used for focused coding and reasoning workloads
Its main strength is efficiency, consistency, and reasoning quality relative to size

Downloads last month: 19

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for OrionLLM/NanoCoder-0.6b

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(925)

this model

Quantizations

2 models

OrionLLM
/

NanoCoder-0.6b

Key Characteristics

Limitations

Model tree for OrionLLM/NanoCoder-0.6b

Dataset used to train OrionLLM/NanoCoder-0.6b

Space using OrionLLM/NanoCoder-0.6b 1