-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper β’ 2310.17631 β’ Published β’ 35 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper β’ 2310.08491 β’ Published β’ 57 -
Generative Judge for Evaluating Alignment
Paper β’ 2310.05470 β’ Published β’ 1 -
Calibrating LLM-Based Evaluator
Paper β’ 2309.13308 β’ Published β’ 12
Andrew Reed
andrewrreed
AI & ML interests
Applied ML, Practical AI, Inference & Deployment, LLMs, Multi-modal Models, RAG
Recent Activity
upvoted an article 1 day ago
Harness, Scaffold, and the AI Agent Terms Worth Getting Right liked a model about 1 month ago
openai/privacy-filter upvoted an article 6 months ago
We Got Claude to Fine-Tune an Open Source LLMOrganizations
Curated resources that support the use of LLMs to serve as automatic evaluators of other LLM outputs.
Eval Leaderboards
- Running4.9k
Arena Leaderboard
π4.9kView the LMArena leaderboard in fullβscreen
- Running on CPU Upgrade14k
Open LLM Leaderboard
π14kTrack, rank and evaluate open LLMs and chatbots
- Running on CPU Upgrade7.42k
MTEB Leaderboard
π₯7.42kEmbedding Leaderboard
- RunningAgentsFeatured587
LLM-Perf Leaderboard
π587Compare LLM hardware performance and find the best model
AI x Audio
Hallucination Detection
-
vectara/hallucination_evaluation_model
Text Classification β’ 0.1B β’ Updated β’ 142k β’ 354 -
notrichardren/HaluEval
Viewer β’ Updated β’ 35k β’ 520 -
TRUE: Re-evaluating Factual Consistency Evaluation
Paper β’ 2204.04991 β’ Published β’ 1 -
Fine-grained Hallucination Detection and Editing for Language Models
Paper β’ 2401.06855 β’ Published β’ 4
Small, but mighty chat models
Awesome Spaces
- Running on ZeroAgents119
StableDesign
π119Generate a furnished interior from an empty room photo
- Running on ZeroAgentsFeatured5.4k
IllusionDiffusion
π5.4kGenerate stunning high quality illusion artwork
- Running on ZeroAgentsFeatured1.57k
InstantMesh
π1.57kCreate a 3D model from an image in 10 seconds!
- Runtime errorAgentsFeatured184
Sing an idea β‘οΈ Music
π₯184Bring song ideas to life
LLM as a Judge
Curated resources that support the use of LLMs to serve as automatic evaluators of other LLM outputs.
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper β’ 2310.17631 β’ Published β’ 35 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper β’ 2310.08491 β’ Published β’ 57 -
Generative Judge for Evaluating Alignment
Paper β’ 2310.05470 β’ Published β’ 1 -
Calibrating LLM-Based Evaluator
Paper β’ 2309.13308 β’ Published β’ 12
Hallucination Detection
-
vectara/hallucination_evaluation_model
Text Classification β’ 0.1B β’ Updated β’ 142k β’ 354 -
notrichardren/HaluEval
Viewer β’ Updated β’ 35k β’ 520 -
TRUE: Re-evaluating Factual Consistency Evaluation
Paper β’ 2204.04991 β’ Published β’ 1 -
Fine-grained Hallucination Detection and Editing for Language Models
Paper β’ 2401.06855 β’ Published β’ 4
Eval Leaderboards
- Running4.9k
Arena Leaderboard
π4.9kView the LMArena leaderboard in fullβscreen
- Running on CPU Upgrade14k
Open LLM Leaderboard
π14kTrack, rank and evaluate open LLMs and chatbots
- Running on CPU Upgrade7.42k
MTEB Leaderboard
π₯7.42kEmbedding Leaderboard
- RunningAgentsFeatured587
LLM-Perf Leaderboard
π587Compare LLM hardware performance and find the best model
Small, but mighty chat models
AI x Audio
Awesome Spaces
- Running on ZeroAgents119
StableDesign
π119Generate a furnished interior from an empty room photo
- Running on ZeroAgentsFeatured5.4k
IllusionDiffusion
π5.4kGenerate stunning high quality illusion artwork
- Running on ZeroAgentsFeatured1.57k
InstantMesh
π1.57kCreate a 3D model from an image in 10 seconds!
- Runtime errorAgentsFeatured184
Sing an idea β‘οΈ Music
π₯184Bring song ideas to life