Post
67
PSA for DeepSpeed users - a long outstanding precision-related critical bug has been identified and fixed in https://github.com/deepspeedai/DeepSpeed/pull/8066 and a new release has been made.
The issue was about mixed precision mode downcasting buffers that had to be in fp32 - massively impacting correctness due to large static buffers - e.g. RoPE in Qwen3 models when using long sequence lengths 32K+.
Hopefully this fix brings Deepspeed to a close parity with FSDP2 which has been an issue since a long time.
You can still have the old behavior but you'd now need to manually configure it - by default the model's buffers will now remain in the original precision.
Please install deepspeed==0.19.2 which will do the right thing.
Thanks to Tunji Ruwase and Claude Opus 4.8 via Cursor for identifying and fixing the problem.
The issue was about mixed precision mode downcasting buffers that had to be in fp32 - massively impacting correctness due to large static buffers - e.g. RoPE in Qwen3 models when using long sequence lengths 32K+.
Hopefully this fix brings Deepspeed to a close parity with FSDP2 which has been an issue since a long time.
You can still have the old behavior but you'd now need to manually configure it - by default the model's buffers will now remain in the original precision.
Please install deepspeed==0.19.2 which will do the right thing.
Thanks to Tunji Ruwase and Claude Opus 4.8 via Cursor for identifying and fixing the problem.