Stas Bekman

stas

https://stasosphere.com/machine-learning/

AI & ML interests

Toolmaker. Software creator, optimizer and harmonizer. Makes things work and fly at Snowflake AI Research Training LLM/RAG/Generative AI/Machine Learning/Scalability

Recent Activity

posted an update about 23 hours ago

PSA for DeepSpeed users - a long outstanding precision-related critical bug has been identified and fixed in https://github.com/deepspeedai/DeepSpeed/pull/8066 and a new release has been made. The issue was about mixed precision mode downcasting buffers that had to be in fp32 - massively impacting correctness due to large static buffers - e.g. RoPE in Qwen3 models when using long sequence lengths 32K+. Hopefully this fix brings Deepspeed to a close parity with FSDP2 which has been an issue since a long time. You can still have the old behavior but you'd now need to manually configure it - by default the model's buffers will now remain in the original precision. Please install deepspeed==0.19.2 which will do the right thing. Thanks to Tunji Ruwase and Claude Opus 4.8 via Cursor for identifying and fixing the problem.

updated a model 3 months ago

stas/ml-engineering-book

posted an update 3 months ago

Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed teams has been integrated into HuggingFace Trainer, Accelerate and TRL For extensive details please see this writeup: https://huggingface.co/blog/ulysses-sp Thanks a lot to Kashif Rasul for helping make it happen. Also the others in the HF team who helped with integration.

View all activity

Organizations

Posts 10

Post

PSA for DeepSpeed users - a long outstanding precision-related critical bug has been identified and fixed in https://github.com/deepspeedai/DeepSpeed/pull/8066 and a new release has been made.

The issue was about mixed precision mode downcasting buffers that had to be in fp32 - massively impacting correctness due to large static buffers - e.g. RoPE in Qwen3 models when using long sequence lengths 32K+.

Hopefully this fix brings Deepspeed to a close parity with FSDP2 which has been an issue since a long time.

You can still have the old behavior but you'd now need to manually configure it - by default the model's buffers will now remain in the original precision.

Please install deepspeed==0.19.2 which will do the right thing.

Thanks to Tunji Ruwase and Claude Opus 4.8 via Cursor for identifying and fixing the problem.