OpenHands
OpenHands started as the open-source OpenDevin project and now ships as the reference implementation behind several top SWE-bench Verified entries. Architecturally it is a sandboxed runtime plus a small set of agent processes (CodeAct, Browser, Planner) that share a workspace. Most agentic-coding research papers in 2025-2026 use OpenHands as their substrate.
Leaderboard Placements
| Benchmark | Best base model | Score | Rank |
|---|---|---|---|
| SWE-bench Verified | Claude Sonnet 4.6 | 65.8 | #8 / 15 |
| Terminal-Bench | Claude Sonnet 4.6 | 30.1 | #10 / 13 |
| Aider Polyglot | — | — | — |
| SWE-Lancer | Claude Sonnet 4.6 | 28.4 | #5 / 5 |
Distribution
Open-source. Run as a Docker container locally or on a hosted runtime. MIT license.
Model Story
Multi-model. Most entries use Claude Sonnet 4.6 or GPT-5.5; the harness has no preferred model.
Pricing
Free harness; you pay for the underlying API tokens and any compute you host.
Who It's For
Researchers and teams building on top of an open agentic substrate, plus anyone who wants the same harness public benchmarks are run on.
Notable Features
- CodeAct: agent expresses actions as Python code
- Built-in browser tool for web tasks
- Sandboxed Docker runtime per session
- Microservice-style agent architecture (swap planners freely)
- Reference implementation for SWE-bench paper submissions