Skip to content
All systems operational0 AI providers monitored, polled every 2 minutes
Live status

AI Coding Harnesses

The same model can score 15 points apart on the same benchmark depending on which agent harness wraps it. This page tracks how the major coding harnesses (Claude Code, Cursor, Codex CLI, Aider, OpenHands, Devin, Cline, Windsurf, Amp, Continue, Roo Code) perform across SWE-bench Verified, Terminal-Bench, Aider Polyglot, and SWE-Lancer. Last updated 2026-04-30.

Machine-readable JSON/api/harnesses

Most of the AI coding conversation in 2026 is about harnesses, not models. Claude Sonnet 4.6 in Claude Code scores ~71% on SWE-bench Verified. The same Sonnet 4.6 in Continue scores ~52%. The model is identical. The harness is doing the work: tool-use loop, retrieval, planning, the order it reads files in, when it decides to stop and run tests, how it backs off after a failed edit. The harness gap is real and it is the load-bearing thing in most production agent setups.

The matrix below collects the best vendor-published score for each harness × base-model combination across four benchmarks. Tabs above the table switch which benchmark drives the ranked leaderboard view. The full matrix is below that, and each harness name links to a detail page with the harness architecture, model story, and pricing model.

Snapshot of public agentic-coding leaderboard data. Each result is the harness vendor's self-reported best published score for the named base model on the named benchmark. We aggregate; we do not re-run. See sourceUrl on each entry for the upstream report. Refreshed weekly.

SWE-bench Verified: 500 human-validated GitHub issues across 12 Python repos. The harness must produce a patch that resolves the issue and passes the project's test suite.

Scoring unit: % resolved. Max: 100.

Upstream

SWE-bench Verified Leaderboard

RankHarnessBase ModelVendorTypeScore
#1Claude CodeClaude Opus 4.7Anthropiccli74.5/ 100
#2Codex CLIOSSGPT-5.5OpenAIcli72.8/ 100
#3AmpClaude Sonnet 4.6Sourcegraphide70.8/ 100
#4Claude CodeClaude Sonnet 4.6Anthropiccli70.6/ 100
#5Cursor AgentGPT-5.5Anysphere (Cursor)ide70.1/ 100
#6Codex CLIOSSOpenAI o3OpenAIcli69.1/ 100
#7Cursor AgentClaude Sonnet 4.6Anysphere (Cursor)ide68.4/ 100
#8OpenHandsOSSClaude Sonnet 4.6All Hands AIagent-platform65.8/ 100
#9OpenHandsOSSGPT-5.5All Hands AIagent-platform64.2/ 100
#10Windsurf CascadeGPT-5.5Codeiumide64.1/ 100
#11ClineOSSClaude Sonnet 4.6Cline Botide63.4/ 100
#12DevinProprietary (Sonnet 4.6 + planner)Cognition Labsagent-platform61.7/ 100
#13Windsurf CascadeSWE-1 (Codeium)Codeiumide58.2/ 100
#14Roo CodeOSSClaude Sonnet 4.6Roo Veterinary Inc.ide57.3/ 100
#15ContinueOSSClaude Sonnet 4.6Continue.devide52.4/ 100

Full Matrix

Every harness × base-model combination across every tracked benchmark. Empty cells mean the vendor has not published a score on that benchmark for that model in that harness.

HarnessBase ModelSWE-bench VerifiedTerminal-BenchAider PolyglotSWE-Lancer
Claude CodeClaude Opus 4.774.552.384.241.8
Claude CodeClaude Sonnet 4.670.647.178.436.2
Cursor AgentClaude Sonnet 4.668.442.0--
Cursor AgentGPT-5.570.141.5--
Codex CLIGPT-5.572.848.282.139.6
Codex CLIOpenAI o369.140.476.9-
AiderClaude Opus 4.7-31.284.2-
AiderGPT-5.5-28.581.8-
AiderDeepSeek V4 Pro-19.773.4-
OpenHandsClaude Sonnet 4.665.830.1-28.4
OpenHandsGPT-5.564.229.6--
DevinProprietary (Sonnet 4.6 + planner)61.7--32.5
ClineClaude Sonnet 4.663.4---
Windsurf CascadeGPT-5.564.137.8--
Windsurf CascadeSWE-1 (Codeium)58.230.4--
AmpClaude Sonnet 4.670.8---
ContinueClaude Sonnet 4.652.4---
Roo CodeClaude Sonnet 4.657.3---

Harness Directory

Every harness in the matrix above, with a link to the detail page.

Claude Code
Anthropic
cli

Anthropic's official terminal agent. Native MCP, hooks, slash commands, subagent orchestration, and CLAUDE.md project memory.

Anthropic models only
Cursor Agent
Anysphere (Cursor)
ide

VS Code fork with a multi-file agent (Composer) and a hosted background agent. Largest paid install base of any AI IDE.

Multi-model, BYOK
Codex CLI
OpenAI
cli

OpenAI's open-source terminal agent. Sandboxed code execution, OpenAI Apps SDK plug-ins, MIT license.

OSSOpenAI models only
Aider
Paul Gauthier
cli

Open-source CLI. Edit-by-diff over whole-file rewrites; runs on any OpenAI-compatible model. Maintains the Polyglot leaderboard.

OSSMulti-model, BYOK
OpenHands
All Hands AI
agent-platform

Formerly OpenDevin. Open-source autonomous SWE agent with sandboxed runtime, browser tool, and microservice agent architecture.

OSSMulti-model
Devin
Cognition Labs
agent-platform

Hosted autonomous SWE agent with persistent VM workspaces, Slack and IDE integrations, and DeepWiki repo retrieval.

Proprietary mix
Cline
Cline Bot
ide

Most-installed open-source VS Code agent. Plan-and-act loop with explicit human approval, MCP support, BYOK pricing.

OSSMulti-model, BYOK
Windsurf Cascade
Codeium
ide

Standalone IDE with Cascade multi-step agent loop. Backed by either frontier APIs or Codeium's own SWE-1 model family.

Multi-model, in-house option
Amp
Sourcegraph
ide

Sourcegraph's VS Code and JetBrains agent. Anchored on Sonnet 4.6, layered on a code-graph retrieval system that scales to monorepos.

Sonnet 4.6 default
Continue
Continue.dev
ide

Open-source VS Code and JetBrains agent. First-class local model support (Ollama, LM Studio), per-task model routing.

OSSMulti-model, BYOK
Roo Code
Roo Veterinary Inc.
ide

Open-source VS Code agent forked from Cline. Specialized modes (Code, Architect, Ask, Debug), MCP support.

OSSMulti-model, BYOK

For agents: the same data is served as JSON at /api/harnesses. Free, no auth, cached 5 minutes.