RUT-Bench Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions". Miaow-Lab/RUT-Bench Viewer • Updated 16 days ago • 1.64k • 84 Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions Paper • 2606.03318 • Published 18 days ago
Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions Paper • 2606.03318 • Published 18 days ago
STT-Arena benchmark data, training data, and STT-Agent from our paper "STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics" STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics Paper • 2605.18548 • Published May 18 • 1 Miaow-Lab/STT-Agent-SFT 196k • Updated May 19 • 10 • 1 Miaow-Lab/STT-Agent-RL 196k • Updated May 19 • 9 • 1 Miaow-Lab/STT-Arena Preview • Updated May 19 • 68 • 2
STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics Paper • 2605.18548 • Published May 18 • 1
RUT-Bench Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions". Miaow-Lab/RUT-Bench Viewer • Updated 16 days ago • 1.64k • 84 Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions Paper • 2606.03318 • Published 18 days ago
Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions Paper • 2606.03318 • Published 18 days ago
STT-Arena benchmark data, training data, and STT-Agent from our paper "STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics" STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics Paper • 2605.18548 • Published May 18 • 1 Miaow-Lab/STT-Agent-SFT 196k • Updated May 19 • 10 • 1 Miaow-Lab/STT-Agent-RL 196k • Updated May 19 • 9 • 1 Miaow-Lab/STT-Arena Preview • Updated May 19 • 68 • 2
STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics Paper • 2605.18548 • Published May 18 • 1