Instructions to use kuotient/Llama-3-8B-Instruct-vector-diff with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kuotient/Llama-3-8B-Instruct-vector-diff with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kuotient/Llama-3-8B-Instruct-vector-diff")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("kuotient/Llama-3-8B-Instruct-vector-diff") model = AutoModelForCausalLM.from_pretrained("kuotient/Llama-3-8B-Instruct-vector-diff") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use kuotient/Llama-3-8B-Instruct-vector-diff with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kuotient/Llama-3-8B-Instruct-vector-diff" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kuotient/Llama-3-8B-Instruct-vector-diff", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/kuotient/Llama-3-8B-Instruct-vector-diff
- SGLang
How to use kuotient/Llama-3-8B-Instruct-vector-diff with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kuotient/Llama-3-8B-Instruct-vector-diff" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kuotient/Llama-3-8B-Instruct-vector-diff", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kuotient/Llama-3-8B-Instruct-vector-diff" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kuotient/Llama-3-8B-Instruct-vector-diff", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use kuotient/Llama-3-8B-Instruct-vector-diff with Docker Model Runner:
docker model run hf.co/kuotient/Llama-3-8B-Instruct-vector-diff
Llama-3 chat vector
- Update 0426: A small problem with the deployment of the model 'Llama-3-Seagull-Evo-8B', but we hope to have it back in good time!
- Update 0526: Check our newest EMM model, Alpha-Ko-8B-Instruct
This is 'modelified' version of chat vector from the paper Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages. So this is not a model, its just weight diff, just for ease to use myself(or you too)!
What I understand here: 'Chat vector method' is a merging method that utilizes the difference between the base model, the continuously pre-trained (usually language transferred) model, and the chat model; so the recipe is
model(base) + weight_diff(continous pretrained) + weight_diff(instruct) or
model(base) + weight_diff(continous pretrained + fine-tuned) + weight_diff(instruct).
So before (my) initial purpose in comparing which method is better, llama3 → CP + chat vector → FT vs. llama3 → CP → FT + chat vector, it seems reasonable to compare it with other methods in Mergekit.
| Model | Method | Kobest(f1) | Haerae(acc) |
|---|---|---|---|
| beomi/Llama-3-Open-Ko-8B-Instruct-preview | chat vector | 0.4368 | 0.439 |
| kuotient/Llama-3-Ko-8B-ties | Ties | 0.4821 | 0.5160 |
| kuotient/Llama-3-Ko-8B-dare-ties | Dare-ties | 0.4950 | 0.5399 |
| kuotient/Llama-3-Ko-8B-TA | Task Arithmetic(maybe...? not sure about this) | - | |
| WIP | Model stock(I don't read this paper yet but still) | - | |
| kuotient/Llama-3-Seagull-Evo-8B | Evolutionary Model Merging | 0.6139 | 0.5344 |
| --- | --- | --- | --- |
| meta-llama/Meta-Llama-3-8B | Base | - | - |
| meta-llama/Meta-Llama-3-8B-Instruct | - | 0.4239 | 0.4931 |
| beomi/Llama-3-Open-Ko-8B | Korean Base | 0.4374 | 0.3813 |
All that aside, I'd like to thank @beomi for creating such an awesome korean-based model.
- Downloads last month
- 4