Running Agents Weak Supervision Reasoning Explorer 🔬 Explore reasoning performance under weak supervision
Running Agents Weak Supervision Reasoning Explorer 🔬 Explore reasoning performance under weak supervision
Sleeping Agents Sudanese Arabic Navigable RAG Demo 🧭 Compare Sudanese Arabic phrase retrieval methods
Sleeping Agents Interleaved Retrieval-Reasoning Benchmark 🔄 Compare Standard vs Interleaved RAG with simulated benchmarks
Running Agents Agent Architecture Visualizer 🔄 Simulate and visualize AI agent loops with permissions
Running Agents Agent Architecture Visualizer 🔄 Simulate and visualize AI agent loops with permissions
Sleeping Agents Sudanese Arabic Navigable RAG Demo 🧭 Compare Sudanese Arabic phrase retrieval methods
Running Agents 1 TESSY Reasoning Demo - Sudanese Arabic 🧠 1 Analyze Sudanese Arabic samples with standard vs TESSY reasoning
Running Agents 1 TESSY Reasoning Demo - Sudanese Arabic 🧠 1 Analyze Sudanese Arabic samples with standard vs TESSY reasoning
Paused Agents Sudanese Arabic SWE-AGILE Reasoning Benchmark 🧠 Run Sudanese Arabic reasoning benchmark with context strategies
Paused Agents Sudanese Arabic SWE-AGILE Reasoning Benchmark 🧠 Run Sudanese Arabic reasoning benchmark with context strategies
Sleeping Agents Sudanese Arabic Synthetic Data Quality Benchmark 🌍 Evaluate Sudanese Arabic models and compare their generated responses
Sleeping Agents Sudanese Arabic Synthetic Data Quality Benchmark 🌍 Evaluate Sudanese Arabic models and compare their generated responses
Paused Agents Sudanese Arabic Reading Comprehension Benchmark 📖 Run Sudanese Arabic QA benchmark and compare models