Long Context Pre-Training with Lighthouse Attention Paper • 2605.06554 • Published 16 days ago • 27
Efficient Pre-Training with Token Superposition Paper • 2605.06546 • Published 16 days ago • 43