daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1100 Text Generation • 2B • Updated 2 days ago • 39
daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1160 Text Generation • 2B • Updated 2 days ago • 41
daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1160 Text Generation • 2B • Updated 2 days ago • 41
daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1100 Text Generation • 2B • Updated 2 days ago • 39
Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning Paper • 2602.03190 • Published Feb 3 • 1