LIVE
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
All systems operational0 AI providers monitored, polled every 2 minutes
Live status
All TF Verdicts
TF Verdict·Security··High confidence

Should AI-discovered CVEs be trusted like human-found ones?

The verdict

No, not by default. Trust the pipeline that ships a working reproduction and a human gate; treat any unreviewed bulk AI finding as an unconfirmed lead, not a CVE, until someone reproduces it.

As of 29 May 2026, my ruling: no, do not trust an AI-found CVE the way you trust a human-found one by default. Trust the pipeline, not the discoverer. An AI finding with a working reproduction and a human reviewer earns the same weight as any human report. An AI finding without reproduction is a lead, not a CVE.

The numbers force the split. When raw AI output hits a triage queue, it drowns the signal. curl's confirmed-vulnerability rate ran north of 15 percent for years, then cratered below 5 percent in 2025. More than nineteen in twenty submissions were noise. curl killed its bounty on January 31, 2026 and moved reports to GitHub with no reward.

Now the other side. Google's Big Sleep reported 20 real vulnerabilities in popular open source (FFmpeg and ImageMagick). Separately, it caught a SQLite bug (CVE-2025-6965, July 2025) before exploitation and a use-after-free in Chrome's ANGLE library (CVE-2025-9478, August 11 2025, patched in Chrome 139). Google's own rule: a human expert stays in the loop before anything gets reported, and the agent reproduces each bug itself. That gate is the whole game.

Fair caveat: these are not identical setups. curl fields low-skill submitters with LLMs; Big Sleep is one well-resourced research agent. Sophistication matters too, not just the gate.

The supply is real and rising. CVEs traced to AI-written code went from 6 in January to 35 in March 2026. Veracode found that AI-generated code carried known OWASP vulnerabilities in 45 percent of samples, a leading indicator of AI-introduced defects. AI finds bugs, including bugs AI created.

Bottom line: judge the evidence, not the author. Demand a reproduction and a named verifier. No repro, no CVE.

The evidence

The data points behind this verdict. Each is cited so you can check the call against its source.

curl's confirmed-vulnerability rate ran north of 15 percent historically and fell below 5 percent in 2025, per maintainer Daniel Stenberg

>15% historical confirmed rate dropping to <5% in 2025

daniel.haxx.se (Daniel Stenberg, curl maintainer), 'The end of the curl bug-bounty'

curl accepted its last HackerOne submissions on January 31, 2026 and now routes reports through GitHub with no monetary reward, after a flood of AI-generated reports

Bounty ended Jan 31, 2026; reports moved to GitHub, no reward

BleepingComputer, 'Curl ending bug bounty program after flood of AI slop reports'

Google's Big Sleep reported 20 vulnerabilities in popular open source (FFmpeg and ImageMagick); Google states a human expert is in the loop before reporting and the AI agent finds and reproduces each vulnerability without human intervention, and withheld specific CVE details while bugs were being fixed

20 vulnerabilities (FFmpeg, ImageMagick); human-in-the-loop policy + AI reproduces each bug

TechCrunch (quoting a Google spokesperson), Aug 4 2025

Big Sleep separately discovered CVE-2025-9478, a use-after-free in Chrome's ANGLE library, on August 11, 2025; Google patched it in Chrome 139.0.7258.154. The CVE carries CVSS 8.8 (High).

AI-discovered Aug 11 2025; CVSS 8.8 (High); patched in Chrome 139.0.7258.154

securityonline.info, 'Google Chrome Patches ANGLE Vulnerability (CVE-2025-9478) Discovered by AI Agent Big Sleep'

AI-generated code introduced known OWASP vulnerabilities in 45 percent of samples across 100-plus models and 80-plus tasks, a leading indicator of AI-introduced defects

45% of AI-generated code carries OWASP vulnerabilities

Veracode 2025 GenAI Code Security Report (via Infosecurity Magazine)

CVEs traced to AI-generated code rose from 6 in January to 15 in February to 35 in March 2026, showing AI-origin vulnerabilities are real and growing

6 in Jan, 15 in Feb, 35 in Mar 2026 (CVEs from AI-generated code)

Georgia Tech Vibe Security Radar (Hanqing Zhao), via Cloud Security Alliance research note 'Vibe Coding's Security Debt: The AI-Generated CVE Surge'

Caveats

The curl sub-5 percent figure is a single high-volume open-source program, the clearest public validation-rate disclosure but not an industry-wide average; other programs reported less severe slop, and it reflects low-skill LLM-wielding submitters rather than all AI finders. BleepingComputer notes that some of curl's 2026 submissions were in fact real bugs, so the collapse in confirmed rate is about volume of noise, not a claim that AI never finds anything. Big Sleep's 20-vulnerability count and human-in-the-loop description come from Google's own statements via TechCrunch, which also notes Google withheld specific CVE details while bugs were being fixed; the "human expert in the loop" claim is Google's general policy across all 20 findings, not a documented bug-specific verification of CVE-2025-9478, whose cited source confirms only AI discovery and the Chrome patch. The 20-vulnerability batch (FFmpeg and ImageMagick) is distinct from the SQLite catch (CVE-2025-6965, July 2025, a threat-intel-staged-exploit interception) and from the ANGLE bug (CVE-2025-9478, August 2025); they are separate Big Sleep disclosures, not a single batch. CVE-2025-9478 carries CVSS 8.8, which is High rather than Critical despite some headline framing. The Veracode 45 percent figure measures code-quality defects in AI-generated code, a leading indicator rather than a direct count of AI-discovered CVEs. The Georgia Tech monthly counts measure AI-origin code, which overlaps with but is not identical to AI-discovered CVEs.

A TF Verdict is TensorFeed's own analysis over cited public data, not a republished dataset. We take a clear position, show the evidence and the sources, and date-stamp the call because the answer can change. Disagree with a data point? Follow the source link and check it yourself.