Dataset and models for transforming LFM2 2.6B into a Tic Tac Toe master using RL Environments. Free course: https://t.ly/4jIFq
Stefano Fiorucci PRO
anakin87
AI & ML interests
Language Models: orchestration, post-training, GRPO, synthetic data...
Contributing to Haystack LLM framework 🏗️
Recent Activity
updated a Space 1 day ago
anakin87/Phi-3.5-mini-ITA liked a dataset 17 days ago
VAGOsolutions/SauerkrautLM-Doom-MultiVec-31k upvoted an article 22 days ago
ML Intern Takes Our Post-Training Internship TestOrganizations
Qwen Scheduler GRPO
Train a SLM to create a schedule from a list of events and priorities - Article: https://t.ly/-Dejx - Code: https://t.ly/1J_VG
🇮🇹 Italian Merges
I tried to merge two of the best Italian LLMs using Mergekit. The results are acceptable, but I could not improve on the best existing model.
📝 Cool LLM papers
Starting from 2024-11-15
-
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 141 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Inference-Time Scaling for Generalist Reward Modeling
Paper • 2504.02495 • Published • 58 -
Large Language Diffusion Models
Paper • 2502.09992 • Published • 127
Gemma Neogenesis 💎🌍🇮🇹
Datasets and models for Neogenesis: Post-training recipe for improving Gemma 2 for a specific language. Notebook: https://t.ly/iuKdy
-
anakin87/gemma-2-9b-neogenesis-ita
Text Generation • 9B • Updated • 1.07k • • 11 -
anakin87/gemma-2-2b-neogenesis-ita
Text Generation • 3B • Updated • 1.14k • • 6 - SleepingAgents
Gemma 2 9B Neogenesis ITA
💎9B Italian strong model 💪
- Running on ZeroAgents3
Gemma 2 2B Neogenesis ITA
💎3Chat with an Italian Small Model
LFM2 2.6B Mr. Tic Tac Toe ❌ ⭕
Dataset and models for transforming LFM2 2.6B into a Tic Tac Toe master using RL Environments. Free course: https://t.ly/4jIFq
📝 Cool LLM papers
Starting from 2024-11-15
-
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 141 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Inference-Time Scaling for Generalist Reward Modeling
Paper • 2504.02495 • Published • 58 -
Large Language Diffusion Models
Paper • 2502.09992 • Published • 127
Qwen Scheduler GRPO
Train a SLM to create a schedule from a list of events and priorities - Article: https://t.ly/-Dejx - Code: https://t.ly/1J_VG
Gemma Neogenesis 💎🌍🇮🇹
Datasets and models for Neogenesis: Post-training recipe for improving Gemma 2 for a specific language. Notebook: https://t.ly/iuKdy
-
anakin87/gemma-2-9b-neogenesis-ita
Text Generation • 9B • Updated • 1.07k • • 11 -
anakin87/gemma-2-2b-neogenesis-ita
Text Generation • 3B • Updated • 1.14k • • 6 - SleepingAgents
Gemma 2 9B Neogenesis ITA
💎9B Italian strong model 💪
- Running on ZeroAgents3
Gemma 2 2B Neogenesis ITA
💎3Chat with an Italian Small Model
🇮🇹 Italian Merges
I tried to merge two of the best Italian LLMs using Mergekit. The results are acceptable, but I could not improve on the best existing model.