Simplifying Private AI Model Deployments with OctoStack: The Latest Solution from OctoAI

In its early days, OctoAI focused almost exclusively on optimizing models to run more effectively. With the rise of generative AI, the team then launched the fully managed OctoAI platform to help its users serve and fine-tune existing models. OctoStack, at its core, is that OctoAI platform, but for private deployments. Deploying OctoStack should be straightforward for most enterprises, as OctoAI delivers the platform with read-to-go containers and their associated Helm charts for deployments. For developers, the API remains the same, no matter whether they are targeting the SaaS product or OctoAI in their private cloud.

OctoAI (formerly known as OctoML), has officially launched OctoStack – a comprehensive solution that simplifies the deployment of generative AI models in a company’s private cloud. OctoStack supports deployment on-premises or in a virtual private cloud from major vendors such as AWS, Google, Microsoft, and Azure, as well as Coreweave, Lambda Labs, Snowflake and others.

In its early days, OctoAI’s main focus was on optimizing models for better performance. Using the Apache TVM machine learning compiler framework, the company later launched its TVM-as-a-Service platform. Over time, this platform expanded into a full-fledged model-serving offering, combining optimization capabilities with a DevOps platform. With the growing popularity of generative AI, OctoAI then released a fully managed platform to assist users in serving and fine-tuning pre-existing models. The core of this platform is now known as OctoStack, but is specifically designed for private deployments.

During a recent interview, OctoAI CEO and co-founder, Luis Ceze, shared that the platform currently boasts over 25,000 developers and hundreds of paying customers using it in production. While many of these customers are what Ceze refers to as “GenAI-native” companies, there is a significantly larger market of traditional enterprises eager to adopt generative AI. It’s not surprising then, that OctoAI is now targeting this market with OctoStack.

“One thing that became clear is that, as the enterprise market moves from experimenting last year to deploying this year, one, all of them are looking around because they’re nervous about sending data over an API,” Ceze explained. “Two: a lot of them have already invested in their own computing, so why would they buy an API when they can simply use their own? And three, no matter the certifications and reputations, they still consider their AI as precious as their data and are wary of sending it over. There is a clear demand in the enterprise for complete control over deployment.”

According to Ceze, the team has been working on the architecture for both the SaaS and hosted platforms for quite some time. While the SaaS platform is optimized for Nvidia hardware, OctoStack has a much broader range of hardware support, including AMD GPUs and AWS’s Inferentia accelerator. This presents a greater optimization challenge but also highlights OctoAI’s strengths.

The deployment process for OctoStack should be relatively straightforward for most enterprises, as OctoAI provides read-to-go containers and associated Helm charts for easy deployment. For developers, the API remains the same, whether they are using the SaaS platform or OctoAI in their private cloud.

The primary use case for enterprises is to use text summarization and RAG to enable their employees to chat with internal documents. However, some companies are also fine-tuning models to run their own code generation on their internal code bases, similar to the capabilities GitHub offers to Copilot Enterprise users.

Having complete control over deployment in a secure environment is a critical factor for many enterprises looking to incorporate these technologies into their daily operations for employees and customers alike.

“For our performance- and security-sensitive use case, it is imperative that the models which process calls data run in an environment that offers flexibility, scale and security,” said Joshua Kennedy-White, CRO at Apate AI. “OctoStack lets us easily and efficiently run the customized models we need, within environments that we choose, and deliver the scale our customers require.”

Avatar photo
Dylan Williams

Dylan Williams is a multimedia storyteller with a background in video production and graphic design. He has a knack for finding and sharing unique and visually striking stories from around the world.

Articles: 874

Leave a Reply

Your email address will not be published. Required fields are marked *