NVIDIA Unveils Nemotron 3 Nano Omni Open-Source Multimodal

According to an April 28 announcement on NVIDIA’s official blog (author Kari Briski), NVIDIA has released Nemotron 3 Nano Omni — an open-source multimodal model that integrates visual, speech, and language capabilities into a single model, aiming to provide AI agent systems with a lower-latency, lower-cost “perception layer.”

Key specs: 30B-A3B MoE, 256K context, 9x throughput, and top positions on 6 leaderboards

Key architecture:

30B-A3B hybrid mixture-of-experts (total parameters 30B, activated 3B)

Integrates Conv3D and EVS encoding

256K context length

Input: text, images, audio, video, documents, charts, GUI screens

Output: text

Performance signals: achieves 9x throughput under the same level of interaction compared to other open-source omni models; takes first place across 6 benchmark leaderboards in three categories—document intelligence, video understanding, and audio understanding (NVIDIA’s announcement does not list specific scores, guiding readers to the developer blog for details).

NVIDIA positions Nemotron 3 Nano Omni as the “eyes and ears” within agent systems. It can be tasked in combination with other family models such as Nemotron 3 Super (high-frequency execution) and Nemotron 3 Ultra (complex planning), and it can also interoperate with third-party cloud models. Three typical agent application scenarios:

Computer Use Agent: native 1920×1080 resolution visual reasoning

Document intelligence: cross-figures, tables, screenshots, and mixed-media input inference

Audio/video understanding: integrate spoken content, visual content, and recordings into a single reasoning chain

Adopting lineup: Hon Hai, Palantir join; H Company CEO puts name to the statement

In NVIDIA’s announcement, it clearly distinguishes “production adoption” from “currently being evaluated”:

Already in production: Aible, Applied Scientific Intelligence (ASI), Eka Care, Hon Hai (Foxconn), H Company, Palantir, Pyler

Currently being evaluated: Amdocs, Dell, Docusign, Infosys, IQVIA, Lila, Oracle, Quantiphi, TCS, Zefr, etc.

In the announcement, H Company CEO Gautier Cloix puts his name to the statement: “To build useful agents, you can’t wait seconds for a model to interpret a screen. By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before.” Translation: “To build useful agents, you can’t wait seconds for a model to interpret a screen. By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before.”

Open-source strategy and deployment: weights / datasets / training methods fully公开

At the time of release, NVIDIA also makes public:

Model weights

Training datasets

Training technologies / methodologies

The deployment pipeline covers three layers:

Local workstations: NVIDIA DGX Spark, DGX Station

NIM microservices: build.nvidia.com

Third-party platforms: Hugging Face, OpenRouter, and—via 25+ NVIDIA Cloud Partners, inference platforms, and cloud service providers—providing it

Custom tools use NVIDIA NeMo. Over the past year, the Nemotron 3 family (Nano / Super / Ultra) has accumulated more than 50 million downloads on Hugging Face. With this Omni release, it extends the family’s capabilities into the multimodal and agentic domains.

This article about NVIDIA publishing the open-source multimodal Nemotron 3 Nano Omni first appeared on 链新闻 ABMedia.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments