Agent Performance Analysis

Multi-dimensional evaluation across 2,847 test scenarios

Agents Tested
127
Test Scenarios
2,847
Data Points
361K
Avg Success
73.4%
Top Quartile
89.2%
Std Deviation
±12.8
Task Completion vs Reasoning Depth
X: Reasoning Steps
Y: Success Rate
Success Rate (%)
100 75 50 25 0 2 4 6 8 10
Reasoning Steps (avg per task)
High Performers
Mid-High
Average
Below Avg
Low Performers
Top Performing Agents This Week
1
ReasonerV3-XL 94.7%
2
PlannerPro-2 93.2%
3
CodeAgent-T5 91.8%
4
ToolMaster-v4 88.4%
5
MultiStep-Agent 87.1%
Cluster Distribution
High Performers
23 18.1%
Mid-High
31 24.4%
Average
38 29.9%
Below Avg
22 17.3%
Success by Task Type
Code Gen
82%
Reasoning
76%
Tool Use
71%
Multi-Step
64%