Instructions to use beyoru/Luna with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use beyoru/Luna with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="beyoru/Luna") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("beyoru/Luna") model = AutoModelForCausalLM.from_pretrained("beyoru/Luna") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use beyoru/Luna with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "beyoru/Luna" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "beyoru/Luna", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/beyoru/Luna
- SGLang
How to use beyoru/Luna with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "beyoru/Luna" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "beyoru/Luna", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "beyoru/Luna" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "beyoru/Luna", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use beyoru/Luna with Docker Model Runner:
docker model run hf.co/beyoru/Luna
Is there an updated plan?
Nice LLM. In my testing, this LLM performs far better than other LLMs of the same size or even larger in SillyTavern, and in some scenarios, it even approaches the capabilities of an 8B untuned LLM with reasoning abilities. Are there any further update plans in the near future? Thank you for your work.
Hope you like it, and for future plans, I think I need to consider more to continue developing this type of model. I am also very curious how you evaluate this model.
I use LM Studio as the backend to provide API services, select the same character card, and chat with the LLM through SillyTavern. I have found that models of the same size more or less suffer from the following issues:
- Inability to understand the character card or significant misunderstanding of it.
- For a progressive story, directly revealing a large amount of character setting or even the ending in the first dialogue.
- Lack of Chinese support. Even though some models are based on Qwen, they still "cannot" or "do not understand" Chinese prompts and dialogues.
- Severe degradation in understanding complex prompts. Some models of the same size perform decently when the prompts are shortened and simplified, but when faced with the complex prompts of SillyTavern + character cards, their comprehension ability declines significantly. Although Luna also shows a noticeable decline in understanding complex prompts, it performs exceptionally well among the same-sized models I have tested.
Suggestions:
- Enhance the understanding of complex prompts.
- Perhaps also add reasoning capabilities to the model?
However, these are just my personal suggestions. I think the model is already excellent at this stage. I don't even know how to get the datasets required for the features I suggested :) Thank you.
I have added reasoning for this model