SFT - a leideng Collection

leideng 's Collections

SFT

updated about 21 hours ago

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7, 2025 • 190
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
LIMA: Less Is More for Alignment

Paper • 2305.11206 • Published May 18, 2023 • 27
Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Paper • 2408.16673 • Published Apr 5, 2025
SED-SFT: Selectively Encouraging Diversity in Supervised Fine-Tuning

Paper • 2602.07464 • Published Feb 7 • 1
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Paper • 2510.01171 • Published Oct 1, 2025 • 19
SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs

Paper • 2509.20758 • Published Sep 25, 2025 • 2