LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
All endpoints

Multimodal Models

Free
GET /api/multimodal

The /api/multimodal endpoint returns the catalog of production image generation, video generation, text-to-speech, and speech-to-text models. Pricing is in modality-native units (per image, per second of video, per 1k characters, per minute of audio) so cross-modality comparison is not meaningful, but within-modality sorting by price is the primary use.

When to use this endpoint

When your agent needs to pick a multimodal model for image, video, TTS, or STT work. The /api/models endpoint covers chat models; this is the missing peer for the other modalities.

Parameters

NameInTypeDescription
modalityquerystringFilter to "image", "video", "tts", or "stt"e.g. video

* required

Example response

{
  "ok": true,
  "lastUpdated": "2026-04-30",
  "count": 6,
  "models": [
    {
      "id": "veo-3",
      "name": "Veo 3",
      "provider": "Google",
      "modality": "video",
      "pricingUnit": "per_second_video",
      "pricingAmount": 0.50,
      "maxOutput": "8s @ 1080p with audio",
      "features": ["native audio", "lip-sync", "image-to-video"]
    }
  ]
}

Code samples

Python SDK

from tensorfeed import TensorFeed
tf = TensorFeed()
data = tf.multimodal(modality="video")
for m in sorted(data["models"], key=lambda x: x["pricingAmount"] or 0):
    print(f"{m['name']:<24} {m['pricingAmount']}/sec  ({m['provider']})")

TypeScript SDK

const res = await fetch("https://tensorfeed.ai/api/multimodal?modality=video");
const { models } = await res.json();
for (const m of models) console.log(`${m.name}: ${m.pricingAmount}/sec`);

FAQ

How are different modalities priced?

Image: per image. Video: per second of generated video. TTS: per 1k characters of input text. STT: per minute of input audio. The pricingUnit field on each entry says exactly which unit applies. Cross-modality price comparison is not meaningful; sort within a modality.

Related endpoints