arxiv:2504.16828
Muhammad Khalifa
mkhalifa
AI & ML interests
natural language genration, reinforcement learning
Recent Activity
upvoted a paper about 21 hours ago
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges liked a dataset 3 days ago
nvidia/Nemotron-Personas-Korea updated a dataset 8 days ago
launch/thinkprm-1K-verification-cots