In a Training Loop 🔄

Stefano Fiorucci PRO

anakin87

mac1e2's profile picture

Ujjwal-Tyagi's profile picture

BuiDoan's profile picture

theanakin87
anakin87
stefano-fiorucci

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

updated a Space 1 day ago

anakin87/Phi-3.5-mini-ITA

liked a dataset 17 days ago

VAGOsolutions/SauerkrautLM-Doom-MultiVec-31k

upvoted an article 22 days ago

ML Intern Takes Our Post-Training Internship Test

View all activity

Organizations

anakin87 's collections 5

LFM2 2.6B Mr. Tic Tac Toe ❌ ⭕

Dataset and models for transforming LFM2 2.6B into a Tic Tac Toe master using RL Environments. Free course: https://t.ly/4jIFq

Running on Zero

Agents

RL

4

Mr. Tic Tac Toe

⭕

4

Play Tic Tac Toe against a small RL tuned model
anakin87/tictactoe

Viewer • Updated Apr 5 • 200 • 19
anakin87/tictactoe-filtered

Viewer • Updated Apr 5 • 174 • 18
anakin87/LFM2-2.6B-ttt-sft

Text Generation • 3B • Updated Apr 5 • 25

Qwen Scheduler GRPO

Train a SLM to create a schedule from a list of events and priorities - Article: https://t.ly/-Dejx - Code: https://t.ly/1J_VG

anakin87/events-scheduling

Viewer • Updated Apr 26, 2025 • 600 • 155 • 3
anakin87/qwen-scheduler-7b-grpo

Text Generation • Updated Apr 26, 2025 • 6

🇮🇹 Italian Merges

I tried to merge two of the best Italian LLMs using Mergekit. The results are acceptable, but I could not improve on the best existing model.

anakin87/Llama-3-8b-ita-ties-pro

Text Generation • 8B • Updated May 24, 2024 • 1.61k • • 1
anakin87/Llama-3-8b-ita-slerp

Text Generation • 8B • Updated May 24, 2024 • 1.6k • • 1
anakin87/Llama-3-8b-ita-ties

Text Generation • 8B • Updated May 24, 2024 • 1.1k • • 3
NikolayKozloff/Llama-3-8b-ita-ties-Q8_0-GGUF

8B • Updated May 19, 2024 • 20 • 1

📝 Cool LLM papers

Starting from 2024-11-15

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 141
Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26, 2025 • 59
Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published Apr 3, 2025 • 58
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 127

Gemma Neogenesis 💎🌍🇮🇹

Datasets and models for Neogenesis: Post-training recipe for improving Gemma 2 for a specific language. Notebook: https://t.ly/iuKdy

anakin87/gemma-2-9b-neogenesis-ita

Text Generation • 9B • Updated Mar 10, 2025 • 1.07k • • 11
anakin87/gemma-2-2b-neogenesis-ita

Text Generation • 3B • Updated Jan 16, 2025 • 1.14k • • 6
Sleeping

Agents

Gemma 2 9B Neogenesis ITA

💎

9B Italian strong model 💪
Running on Zero

Agents

3

Gemma 2 2B Neogenesis ITA

💎

3

Chat with an Italian Small Model