Instructions to use beyoru/Luna with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use beyoru/Luna with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="beyoru/Luna")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("beyoru/Luna")
model = AutoModelForCausalLM.from_pretrained("beyoru/Luna")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use beyoru/Luna with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "beyoru/Luna"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "beyoru/Luna",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/beyoru/Luna

SGLang

How to use beyoru/Luna with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "beyoru/Luna" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "beyoru/Luna",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "beyoru/Luna" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "beyoru/Luna",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use beyoru/Luna with Docker Model Runner:
```
docker model run hf.co/beyoru/Luna
```

Is there an updated plan?

by nemozxy123 - opened Sep 18, 2025

Discussion

nemozxy123

Sep 18, 2025

Nice LLM. In my testing, this LLM performs far better than other LLMs of the same size or even larger in SillyTavern, and in some scenarios, it even approaches the capabilities of an 8B untuned LLM with reasoning abilities. Are there any further update plans in the near future? Thank you for your work.

beyoru

Owner Sep 19, 2025

Hope you like it, and for future plans, I think I need to consider more to continue developing this type of model. I am also very curious how you evaluate this model.

nemozxy123

Sep 19, 2025

I use LM Studio as the backend to provide API services, select the same character card, and chat with the LLM through SillyTavern. I have found that models of the same size more or less suffer from the following issues:

Inability to understand the character card or significant misunderstanding of it.
For a progressive story, directly revealing a large amount of character setting or even the ending in the first dialogue.
Lack of Chinese support. Even though some models are based on Qwen, they still "cannot" or "do not understand" Chinese prompts and dialogues.
Severe degradation in understanding complex prompts. Some models of the same size perform decently when the prompts are shortened and simplified, but when faced with the complex prompts of SillyTavern + character cards, their comprehension ability declines significantly. Although Luna also shows a noticeable decline in understanding complex prompts, it performs exceptionally well among the same-sized models I have tested.

Suggestions:

Enhance the understanding of complex prompts.
Perhaps also add reasoning capabilities to the model?

However, these are just my personal suggestions. I think the model is already excellent at this stage. I don't even know how to get the datasets required for the features I suggested :) Thank you.

beyoru

Owner Sep 23, 2025

I have added reasoning for this model

beyoru changed discussion status to closed Sep 23, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment