Stefano Fiorucci's picture

In a Training Loop 🔄

Stefano Fiorucci PRO

anakin87

·

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

liked a model 6 days ago

VAGOsolutions/SauerkrautLM-LFM2.5-GLiNER

upvoted a collection 10 days ago

ClaimExtractor-2605

upvoted an article 21 days ago

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

View all activity

Organizations

Posts 29

Post

3356

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

Articles 4

Article

31

Exploring Environments Hub: Your Language Model needs better (open) environments to learn

View all Articles

Collections 5

View 5 collections

spaces 7

Phi 3.5 Mini ITA

Chat with an Italian Small Model

Mr. Tic Tac Toe

Play Tic Tac Toe against a small RL tuned model

Gemma 3 270m IT

Chat with Gemma 3 270m IT

Fact Checking rocks!

Fact checking baseline. Dense retrieval + textual entailment

Gemma 2 2B Neogenesis ITA

Chat with an Italian Small Model

Gemma 2 9B Neogenesis ITA

9B Italian strong model 💪

models 21

anakin87/LFM2-2.6B-mr-tictactoe

Text Generation • 3B • Updated Apr 5 • 6 • 1

anakin87/LFM2-2.6B-ttt-rl-2

Text Generation • Updated Apr 5 • 1

anakin87/LFM2-2.6B-ttt-rl-merged

Text Generation • 3B • Updated Apr 5 • 2

anakin87/LFM2-2.6B-ttt-rl

Text Generation • Updated Apr 5 • 1

anakin87/LFM2-2.6B-ttt-sft

Text Generation • 3B • Updated Apr 5 • 145

anakin87/Phi-3.5-mini-ITA

Text Generation • 4B • Updated Mar 24 • 3.75k • 13

anakin87/Qwen3-0.6B-alphabet-sort-grpo

0.6B • Updated Sep 4, 2025 • 4

anakin87/gemma-2-2b-ita-sft

Text Generation • 3B • Updated Jun 29, 2025 • 2

anakin87/electra-italian-xxl-cased-squad-it

Question Answering • 0.1B • Updated Jun 29, 2025 • 4 • 8

anakin87/gemma-2b-orpo

Text Generation • 3B • Updated Jun 29, 2025 • 9 • • 28

datasets 11

anakin87/tictactoe-filtered

Viewer • Updated Apr 5 • 174 • 42

anakin87/tictactoe

Viewer • Updated Apr 5 • 200 • 49

anakin87/Qwen3-0.6B-tuned-alphabet-sort-eval

Viewer • Updated Sep 4, 2025 • 15 • 14

anakin87/Qwen3-0.6B-alphabet-sort-eval

Viewer • Updated Sep 4, 2025 • 15 • 14

anakin87/events-scheduling

Viewer • Updated Apr 26, 2025 • 600 • 75 • 3

anakin87/evol-dpo-ita-reranked

Viewer • Updated Jan 14, 2025 • 19.8k • 36 • 5

anakin87/gemma-vs-gemma-preferences

Viewer • Updated Jan 14, 2025 • 24.7k • 31

anakin87/fine-instructions-ita-70k

Viewer • Updated Jan 14, 2025 • 69.9k • 132 • 4

anakin87/FineTome-single-turn-dedup

Viewer • Updated Jan 11, 2025 • 83.3k • 31

anakin87/tulu-3-sft-mixture-with-language

Viewer • Updated Dec 11, 2024 • 939k • 120

View 11 datasets