LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
All harnesses

Devin

Cognition Labs

Cognition's Devin is a hosted autonomous SWE agent. Each session runs in its own persistent VM with a workspace, browser, and shell, plus Slack and IDE integrations so the agent can be assigned tasks like a human engineer. Cognition also publishes DeepWiki, a separate retrieval system over indexed repos that Devin uses to ground long-horizon work.

Type
agent-platform
License
Proprietary
Model story
Proprietary mix
Vendor
Cognition Labs

Leaderboard Placements

BenchmarkBest base modelScoreRank
SWE-bench Verified Proprietary (Sonnet 4.6 + planner)61.7#12 / 15
Terminal-Bench
Aider Polyglot
SWE-Lancer Proprietary (Sonnet 4.6 + planner)32.5#4 / 5

Distribution

Hosted SaaS. Web app, Slack, GitHub, and Linear integrations. No self-host option.

Model Story

Proprietary model mix. Cognition does not disclose which model serves which step but has stated Sonnet 4.6 is in the rotation alongside an in-house planner.

Pricing

Per-seat subscription with usage limits; team and enterprise plans available.

Who It's For

Teams that want an agent assignable through ticketing systems and willing to trade open-source transparency for managed infrastructure.

Notable Features

  • Persistent VM workspaces per task
  • DeepWiki repo retrieval system
  • Slack and Linear assignment surfaces
  • Async task queues
  • Code review and PR-author workflow
Vendor site for Devin:https://devin.ai

Other Harnesses