GroveMoE
Collection
GroveMoE is an open-source family of large language models developed by the AGI Center, Ant Research Institute. • 3 items • Updated • 9
How to use inclusionAI/GroveMoE-Base with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="inclusionAI/GroveMoE-Base", trust_remote_code=True)
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("inclusionAI/GroveMoE-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("inclusionAI/GroveMoE-Base", trust_remote_code=True)
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use inclusionAI/GroveMoE-Base with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "inclusionAI/GroveMoE-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "inclusionAI/GroveMoE-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/inclusionAI/GroveMoE-Base
How to use inclusionAI/GroveMoE-Base with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "inclusionAI/GroveMoE-Base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "inclusionAI/GroveMoE-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "inclusionAI/GroveMoE-Base" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "inclusionAI/GroveMoE-Base",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use inclusionAI/GroveMoE-Base with Docker Model Runner:
docker model run hf.co/inclusionAI/GroveMoE-Base
We introduce GroveMoE, a new sparse architecture using adjugate experts for dynamic computation allocation, featuring the following key highlights:
| Model | #Total Params | #Activated Params | HF Download | MS Download |
|---|---|---|---|---|
| GroveMoE-Base | 33B | 3.14~3.28B | 🤗 HuggingFace | 📦 ModelScope |
| GroveMoE-Inst | 33B | 3.14~3.28B | 🤗 HuggingFace | 📦 ModelScope |
@article{GroveMoE,
title = {GroveMoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts},
author = {Wu, Haoyuan and Chen, Haoxing and Chen, Xiaodong and Zhou, Zhanchao and Chen, Tieyuan and Zhuang, Yihong and Lu, Guoshan and Zhao, Junbo and Liu, Lin and Huang, Zenan and Lan, Zhenzhong and Yu, Bei and Li, Jianguo},
journal = {arXiv preprint arXiv:2508.07785},
year = {2025}
}