None defined yet.
Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness
AirQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation