LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
All harnesses

Cursor Agent

Anysphere (Cursor)

Cursor is a VS Code fork built around an in-IDE agent (Cursor Agent and the Composer multi-file editor) plus a hosted Background Agent that runs longer tasks asynchronously. Cursor has the largest paid install base of any AI IDE, which gives it real signal on prompt patterns and tool use that smaller harnesses do not have.

Type
ide
License
Proprietary
Model story
Multi-model, BYOK
Vendor
Anysphere (Cursor)

Leaderboard Placements

BenchmarkBest base modelScoreRank
SWE-bench Verified GPT-5.570.1#5 / 15
Terminal-Bench Claude Sonnet 4.642.0#4 / 13
Aider Polyglot
SWE-Lancer

Distribution

Standalone IDE (VS Code fork) for macOS, Windows, and Linux.

Model Story

Multi-model with bring-your-own-key. Default routes through Cursor's own infra; users can pick Sonnet 4.6, GPT-5.5, Opus 4.7, Gemini 2.5, or smaller open models per task.

Pricing

Per-seat subscription with included usage credits; overage billed per request.

Who It's For

Teams that want a polished IDE-native agent and are willing to trade open-source flexibility for tighter UX and a hosted background agent.

Notable Features

  • Composer for multi-file edits
  • Background Agent for async long-running tasks
  • In-line diff review across the workspace
  • Per-task model routing
  • Repo-aware embedding index
Vendor site for Cursor Agent:https://cursor.com

Other Harnesses