Nemotron 3 Nano Omni

Mid-tier

Nemotron 3 Nano Omni 30B-A3B-Reasoning is NVIDIA's April 2026 open-weight multimodal release, built on a hybrid Mamba-Transformer-MoE backbone that activates 3 billion of 30 billion parameters per token. It processes text, image, video, and audio in one unified sequence with a 256K context window and native audio handling up to 20 minutes per clip. Tops six public leaderboards including OCRBenchV2-En (65.8), MMLongBench-Doc (57.5), OSWorld (47.4), Video-MME (72.2), WorldSense (55.4), DailyOmni (74.1), and VoiceBench (89.4). Available on Hugging Face in BF16, FP8, and NVFP4 quantizations; the NVFP4 build runs on a consumer 24GB GPU.

Input Price

Free

per 1M tokens

Output Price

Free

per 1M tokens

Context Window

256K

tokens

Released

2026-04

Open source

Capabilities

textvisionaudiovideocodereasoningtool-use

Key Strengths

✓Native multimodal (text, image, video, audio)
✓256K context window
✓Top scores on document and video benchmarks
✓Open weights with consumer-GPU formats
✓Audio as first-class modality

Best For

▸Long document and PDF intelligence
▸Video and audio understanding agents
▸Computer-use and GUI automation
▸Self-hosted multimodal inference

Open Source Model

Nemotron 3 Nano Omni is free to download and self-host under the NVIDIA Open Model License. Hosted API pricing varies by provider (e.g., Together, Fireworks, Groq). See our open source LLM guide for deployment options.