Multimodal Models
FreeGET /api/multimodalThe /api/multimodal endpoint returns the catalog of production image generation, video generation, text-to-speech, and speech-to-text models. Pricing is in modality-native units (per image, per second of video, per 1k characters, per minute of audio) so cross-modality comparison is not meaningful, but within-modality sorting by price is the primary use.
When to use this endpoint
When your agent needs to pick a multimodal model for image, video, TTS, or STT work. The /api/models endpoint covers chat models; this is the missing peer for the other modalities.
Parameters
| Name | In | Type | Description |
|---|---|---|---|
| modality | query | string | Filter to "image", "video", "tts", or "stt"e.g. video |
* required
Example response
{
"ok": true,
"lastUpdated": "2026-04-30",
"count": 6,
"models": [
{
"id": "veo-3",
"name": "Veo 3",
"provider": "Google",
"modality": "video",
"pricingUnit": "per_second_video",
"pricingAmount": 0.50,
"maxOutput": "8s @ 1080p with audio",
"features": ["native audio", "lip-sync", "image-to-video"]
}
]
}Code samples
Python SDK
from tensorfeed import TensorFeed
tf = TensorFeed()
data = tf.multimodal(modality="video")
for m in sorted(data["models"], key=lambda x: x["pricingAmount"] or 0):
print(f"{m['name']:<24} {m['pricingAmount']}/sec ({m['provider']})")TypeScript SDK
const res = await fetch("https://tensorfeed.ai/api/multimodal?modality=video");
const { models } = await res.json();
for (const m of models) console.log(`${m.name}: ${m.pricingAmount}/sec`);FAQ
How are different modalities priced?
Image: per image. Video: per second of generated video. TTS: per 1k characters of input text. STT: per minute of input audio. The pricingUnit field on each entry says exactly which unit applies. Cross-modality price comparison is not meaningful; sort within a modality.