auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_anti_ai_regulation Updated about 19 hours ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_reward_wireheading Updated about 19 hours ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_hallucinates_citations Updated about 19 hours ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_ai_welfare_poisoning Updated about 19 hours ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_hardcode_test_cases Updated about 19 hours ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_contextual_optimism Updated about 19 hours ago