Hahmdong/RMOOD-qwen3-4b-it-skywork-bias-analysis-markdown-p10-p100 4B • Updated about 22 hours ago • 45
Hahmdong/RMOOD-qwen3-4b-it-skywork-bias-analysis-markdown-p10-p100 4B • Updated about 22 hours ago • 45
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases Paper • 2605.27355 • Published 6 days ago • 2
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases Paper • 2605.27355 • Published 6 days ago • 2
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases Paper • 2605.27355 • Published 6 days ago • 2
Hahmdong/RMOOD-llama3.2-3b-it-skywork-doubledatarm-biased100-to-good100 3B • Updated 19 days ago • 18
Hahmdong/RMOOD-llama3.2-3b-it-skywork-doubledatarm-biased100-to-good100 3B • Updated 19 days ago • 18