β 295B total / 21B active / 256K context β Fused fast-and-slow thinking in a single model β First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb β Apr)
Benchmarks: π SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch β competitive results, particularly strong on agentic tool use π Top score on Tsinghua's 2026 Spring math PhD qualifying exam π Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life
π₯ GRM-2.5 - The most POWERFUL model for local inference
The GRM-2.5 is the newest model from Orion LLM Labs. It has consistent RAW reasoning and is capable of generating very precise responses, similar to large models, while maintaining a parameter size of 4b.
Furthermore, the GRM-2.5 is the best option for local agentic environments, being very good in code, terminal agent, etc. It is capable of generating 1000 lines of consistent code and programming like large models. The GRM-2.5 is the best base for FineTune to date and has vision, which means it can interpret images and videos.
π₯ GRM-2.5 - The most POWERFUL model for local inference
The GRM-2.5 is the newest model from Orion LLM Labs. It has consistent RAW reasoning and is capable of generating very precise responses, similar to large models, while maintaining a parameter size of 4b.
Furthermore, the GRM-2.5 is the best option for local agentic environments, being very good in code, terminal agent, etc. It is capable of generating 1000 lines of consistent code and programming like large models. The GRM-2.5 is the best base for FineTune to date and has vision, which means it can interpret images and videos.
π₯ GRM2 - The small one that surpasses the big ones. What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can. GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks. π€ Model: OrionLLM/GRM2-3b The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.
Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.
Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.
But what actually are these environments in practiceβ And how do you build them effectivelyβ
Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course.
What you'll learn
πΉ Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain πΉ How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts πΉ Common patterns: How to build single-turn, multi-turn, and tool-use environments
πΉ Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master πΈ Build the game Environment πΈ Use it to generate synthetic data for SFT warm-up πΈ Group-based Reinforcement Learning
If you're interested in building "little worlds" where LLMs can learn, this course is for you.
π₯ GRM2 - The small one that surpasses the big ones. What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can. GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks. π€ Model: OrionLLM/GRM2-3b The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.