view article Article Universal Assisted Generation: Faster Decoding with Any Assistant Model +6 danielkorat, orenpereg, mber, jmamou, joaogante, lewtun, Nadav-Timor, moshew • Oct 29, 2024 • 61
view article Article Assisted Generation: a new direction toward low-latency text generation joaogante • May 11, 2023 • 79
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth mlabonne • Jul 29, 2024 • 372
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 aamirshakir, tomaarsen, SeanLee97 • Mar 22, 2024 • 134