Name: NVIDIA Nemotron 3 Nano Omni
Author: NVIDIA

NVIDIA Nemotron 3 Nano Omni — Specifications

Developer	NVIDIA
Type	Multimodal (omni)
Modality	Text, Image, Audio, Video → Text
Parameters	30B total / ~3B active (MoE)
Context window	256K
Max output	—
License	NVIDIA Open Model Agreement
Open weights	✅ Yes
Released	2026
Input price	—
Output price	—
API providers	Hugging Face, OpenRouter, NVIDIA NIM

🖥️ Run it locally

VRAM (FP16/BF16)	~62 GB
VRAM (4-bit)	~21 GB (NVFP4)
Minimum GPU	RTX 5090 32GB (NVFP4) / H100 80GB (BF16)

📊 Benchmarks

OCRBench V2	67.04
Video-MME	72.2
OSWorld	47.4
Speech IF	89.39

Official page →

NVIDIA’s open omni-modal model — it sees, hears, watches and reads (text, image, audio, video → text) in a single 30B-A3B mixture-of-experts that activates only ~3B parameters per token. A Mamba-Transformer hybrid that runs on one high-end GPU; open weights under the NVIDIA Open Model Agreement (commercial use allowed).