Running Agents Weak Supervision Reasoning Explorer 🔬 Explore reasoning performance under weak supervision
Sleeping Agents Sudanese Arabic Navigable RAG Demo 🧭 Compare Sudanese Arabic phrase retrieval methods
Sleeping Agents Interleaved Retrieval-Reasoning Benchmark 🔄 Compare Standard vs Interleaved RAG with simulated benchmarks
Running Agents Agent Architecture Visualizer 🔄 Simulate and visualize AI agent loops with permissions
Running Agents 1 TESSY Reasoning Demo - Sudanese Arabic 🧠 Analyze Sudanese Arabic samples with standard vs TESSY reasoning
Paused Agents Sudanese Arabic SWE-AGILE Reasoning Benchmark 🧠 Run Sudanese Arabic reasoning benchmark with context strategies
Sleeping Agents Sudanese Arabic Synthetic Data Quality Benchmark 🌍 Evaluate Sudanese Arabic models and compare their generated responses
Paused Agents Sudanese Arabic Reading Comprehension Benchmark 📖 Run Sudanese Arabic QA benchmark and compare models
Sleeping Agents Sudanese Arabic Code-Switching Detection 🔄 Detect Arabic‑English code‑switches in Sudanese text
Paused Agents Process Reward Agents: Test-Time Reasoning Scaling 🌳 Compare greedy vs reward‑guided reasoning for a question
Paused Agents Sudanese CoT Reasoning Benchmark 🧠 Generate step-by-step Sudanese Arabic reasoning and analysis
Paused Agents Sudanese Synthetic Instructions 🌍 Generate synthetic Sudanese Arabic instruction datasets