Scalable Artificial Intelligence
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens