Nemotron 3 Nano Omni
Mid-tierby NVIDIA
Nemotron 3 Nano Omni 30B-A3B-Reasoning is NVIDIA's April 2026 open-weight multimodal release, built on a hybrid Mamba-Transformer-MoE backbone that activates 3 billion of 30 billion parameters per token. It processes text, image, video, and audio in one unified sequence with a 256K context window and native audio handling up to 20 minutes per clip. Tops six public leaderboards including OCRBenchV2-En (65.8), MMLongBench-Doc (57.5), OSWorld (47.4), Video-MME (72.2), WorldSense (55.4), DailyOmni (74.1), and VoiceBench (89.4). Available on Hugging Face in BF16, FP8, and NVFP4 quantizations; the NVFP4 build runs on a consumer 24GB GPU.
Input Price
Free
per 1M tokens
Output Price
Free
per 1M tokens
Context Window
256K
tokens
Released
2026-04
Open source
Capabilities
Key Strengths
- ✓Native multimodal (text, image, video, audio)
- ✓256K context window
- ✓Top scores on document and video benchmarks
- ✓Open weights with consumer-GPU formats
- ✓Audio as first-class modality
Best For
- ▸Long document and PDF intelligence
- ▸Video and audio understanding agents
- ▸Computer-use and GUI automation
- ▸Self-hosted multimodal inference
Open Source Model
Nemotron 3 Nano Omni is free to download and self-host under the NVIDIA Open Model License. Hosted API pricing varies by provider (e.g., Together, Fireworks, Groq). See our open source LLM guide for deployment options.