view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • 7 days ago • 49
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 160
Running on CPU Upgrade Featured 3.18k The Smol Training Playbook 📚 3.18k The secrets to building world-class LLMs
Running 105 Unlocking On-Policy Distillation for Any Model Family 📝 105 Visualize on-policy distillation for any model family
view article Article Efficient MultiModal Data Pipeline +3 ariG23498, lusxvr, andito, sergiopaniego, pcuenq • Jul 8, 2025 • 71