“Nvidia Debuts Suite of Microservices for Enhanced Inferencing Capabilities”

At its GTC conference, Nvidia today announced Nvidia NIM, a new software platform designed to streamline the deployment of custom and pre-trained AI models into production environments. NIM takes the software work Nvidia has done around inferencing and optimizing models and makes it easily accessible by combining a given model with an optimized inferencing engine and then packing this into a container, making that accessible as a microservice. Nvidia is already working with Amazon, Google and Microsoft to make these NIM microservices available on SageMaker, Kubernetes Engine and Azure AI, respectively. Some of the Nvidia microservices available through NIM will include Riva for customizing speech and translation models, cuOpt for routing optimizations and the Earth-2 model for weather and climate simulations. “Created with our partner ecosystem, these containerized AI microservices are the building blocks for enterprises in every industry to become AI companies.”

At its GTC conference, Nvidia today announced Nvidia NIM, a new software platform designed to streamline the deployment of custom and pre-trained AI models into production environments. NIM takes the software work Nvidia has done around inferencing and optimizing models and makes it easily accessible by combining a given model with an optimized inferencing engine and then packing this into a container, making it accessible as a microservice.

According to Nvidia, it typically takes developers weeks — if not months — to ship similar containers, especially if they do not have in-house AI talent. However, NIM aims to change this by creating an ecosystem of AI-ready containers curated by Nvidia and using its hardware as the foundational layer. These microservices will serve as the core software layer for companies looking to accelerate their AI roadmap.

NIM currently supports models from NVIDIA, A121, Adept, Cohere, Getty Images, and Shutterstock, as well as open models from Google, Hugging Face, Meta, Microsoft, Mistral AI, and Stability AI. Nvidia is also collaborating with Amazon, Google, and Microsoft to make these microservices available on SageMaker, Kubernetes Engine, and Azure AI respectively. They will also be integrated into popular frameworks like Deepset, LangChain, and LlamaIndex.

“We believe that the Nvidia GPU is the best place to run inference of these models on […], and we believe that NVIDIA NIM is the best software package, the best runtime, for developers to build on top of. This way, they can focus on the enterprise applications and let Nvidia do the work to efficiently produce these models, allowing them to just do the rest of their work,” said Manuvir Das, the head of enterprise computing at Nvidia, during a press conference ahead of today’s announcements.

As for the inference engine, Nvidia will use the Triton Inference Server, TensorRT, and TensorRT-LLM. Some of the Nvidia microservices available through NIM will include Riva for customizing speech and translation models, cuOpt for routing optimizations, and Earth-2 for weather and climate simulations.

The company plans to add additional capabilities over time, such as making the Nvidia RAG LLM operator available as a NIM. This promises to make building generative AI chatbots that can pull in custom data a lot easier.

Of course, no developer conference is complete without customer and partner announcements. Among NIM’s current users are industry giants such as Box, Cloudera, Cohesity, Datastax, Dropbox, and NetApp.

“Established enterprise platforms are sitting on a goldmine of data that can be transformed into generative AI copilots,” said Jensen Huang, founder and CEO of Nvidia. “Created with our partner ecosystem, these containerized AI microservices are the building blocks for enterprises in every industry to become AI companies.”