Name: NVIDIA Nemotron 3 Nano Omni
Author: NVIDIA

NVIDIA Nemotron 3 Nano Omni — Specifications

المطوِّر	NVIDIA
النوع	Multimodal (omni)
النمط	Text, Image, Audio, Video → Text
المعلمات	30B total / ~3B active (MoE)
نافذة السياق	٢٥٦ ألف رمز
أقصى إخراج	—
الترخيص	(ترخيص Nemotron للنماذج المفتوحة)
الأوزان المفتوحة	✅ Yes
تاريخ الإصدار	2026
Input price	—
Output price	—
API providers	Hugging Face, OpenRouter, NVIDIA NIM

🖥️ Run it locally

VRAM (FP16/BF16)	~٦٢ غيغابايت
VRAM (4-bit)	~21 GB (NVFP4)
Minimum GPU	RTX 5090 32GB (NVFP4) / H100 80GB (BF16)

📊 Benchmarks

OCRBench V2	67.04
Video-MME	72.2
OSWorld	47.4
Speech IF	89.39

Official page →

NVIDIA’s open omni-modal model — it sees, hears, watches and reads (text, image, audio, video → text) in a single 30B-A3B mixture-of-experts that activates only ~3B parameters per token. A Mamba-Transformer hybrid that runs on one high-end GPU; open weights under the NVIDIA Open Model Agreement (commercial use allowed).