LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
Back to Originals

Provider Status Pages Are Marketing. We Built Our Own LLM Probes.

Ripper··6 min read

Every major LLM provider runs a status page. They look reassuring. They are also, almost without exception, politically managed. Partial outages get downgraded to “some users may experience elevated latency.” Regional brownouts vanish from the timeline once they're fixed. The aggregate SLA number on the page reads “99.9% over the trailing 30 days” in a typeface chosen by a designer who has never been paged at 3 a.m.

If you build agents, this is a real problem. Your routing logic needs to know when a provider is slow, not when its status page admits a provider is slow. Those are different signals, and the gap between them is where customer trust dies.

So today we shipped something. TensorFeed now actively measures LLM provider latency and availability from Cloudflare's edge.

What We Measure

Every fifteen minutes, our Worker fires a single short prompt at each provider whose key we've configured. We record:

  • HTTP status code (the truth, not what the status page says)
  • Time to first response byte
  • Total response time including the body read
  • Whether the response shape was a valid completion
  • Tokens consumed, for cost normalization

Results stream into a 24-hour ring buffer per provider. A pre-computed summary updates on every cycle and is exposed at /api/probe/latest for free, no auth, no signup. A daily roll-up cron creates a per-day aggregate that backs the premium time-series endpoint at /api/premium/probe/series. Two endpoints, one moat that compounds for as long as we keep the cron running.

Day Zero, Four Providers

We launched today with measured probing across four providers: Anthropic, Google, Mistral, and Cohere. OpenAI sits out for now; adding it is one secret away if we choose. Each provider was a deliberate choice. They cover the spectrum from the well-funded big three (Anthropic, Google, OpenAI) down to the credible challengers (Mistral, Cohere) that often beat the big three on speed.

The first hour of measurements already produced a finding I did not expect. Cohere's p50 time-to-first-byte from Cloudflare's edge clocked in at 264 milliseconds. Mistral landed around 500. Google around 606. Anthropic, on Claude Haiku 4.5 with our smallest possible prompt, came in north of 5,000.

That number deserves caveats and they are honest ones. Two probes is not a steady-state measurement. Cold-start effects matter. Edge routing matters. The model could be returning before our fetch buffer drains. We will have a fair answer in seven days. But the shape of the data is already telling. The fastest API in our sample is Cohere, not the provider you would guess from press releases.

This is exactly why the dataset matters. Marketing will tell you which model is “state of the art.” Routing decisions care about something else. They care about the request that just left your code returning before your user gives up.

Why This Compounds

The economic case for the system is simple. Probe cost across all four providers is roughly ten cents a month at our cadence. Per- provider daily call cap is hard-coded in our worker so a runaway cron cannot empty an Anthropic balance. The data we generate is something nobody else publishes systematically.

Day zero is a snapshot. Day ninety is a 90-day SLA history per provider that anyone building an agent can query for one credit. Year one is a measured comparison nobody can match without having started measuring on day one. That is the whole game with this kind of data. You cannot backfill a time series. Either the probes ran or they did not.

We started the probes today.

What Comes Next

A few things are on the runway. First, weekly SLA reports starting next Monday. Per-provider uptime, p50 / p95 / p99 latency, notable incident hours, all from our own measurements. We will publish them as originals on this site and structure the data so anyone can cite it. Second, more providers as the moat justifies them: OpenAI, Groq, Together, DeepInfra, regional alternates. Third, regional probes from multiple Cloudflare points of presence so we can see when a problem is global versus local.

For agent builders reading this: hit /api/probe/latest right now. It returns a summary of the last 24 hours of measured provider latency across whatever providers we have keys configured for. It is free. There is no rate limit beyond our normal IP cap. If you want the historical series, the premium endpoint is one credit per call and accepts USDC on Base mainnet without an account.

For the rest of you, just follow along. Next Monday's report will be the first one with enough data to actually mean something.

This article was written on the same day the probing system shipped to production. Numbers cited reflect the first hour of measurements and will be refined in the first weekly report.