On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published Feb 17 • 11
Improving Diffusion Models's Data-Corruption Resistance using Scheduled Pseudo-Huber Loss Paper • 2403.16728 • Published Mar 25, 2024 • 1
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 89
Gemma-4-26B-A4B Re-Genned Datasets Collection List of slop in the sets: https://gist.github.com/xzuyn/27ab680bc4a0338b1a6f293c07e38649 • 7 items • Updated 18 days ago
Gemma-4-26B-A4B Re-Genned Datasets Collection List of slop in the sets: https://gist.github.com/xzuyn/27ab680bc4a0338b1a6f293c07e38649 • 7 items • Updated 18 days ago
Gemma-4-26B-A4B Re-Genned Datasets Collection List of slop in the sets: https://gist.github.com/xzuyn/27ab680bc4a0338b1a6f293c07e38649 • 7 items • Updated 18 days ago