“Maximizing Start-Up Success: 5 Key Strategies for Effective LLM Deployment”

In fact, an April 2023 Arize survey found that 53% of respondents planned to deploy LLMs within the next year or sooner. The H100 GPU from Nvidia, a popular choice for LLMs, has been selling on the secondary market for about $40,000 per chip. One source estimated it would take roughly 6,000 chips to train an LLM comparable to ChatGPT-3.5. That source estimated that the power consumption to run ChatGPT-3.5 is about 1 GWh a day, or the combined daily energy usage of 33,000 households. Power consumption can also be a potential pitfall for user experience when running LLMs on portable devices.

The Rise of Large Language Models with ChatGPT’s Launch

ChatGPT’s launch has marked the beginning of a new era filled with immense possibilities brought by large language models. Already, we’ve seen the rise of OpenAI’s offerings, Google’s LaMDA family of LLMs, the BLOOM project, Meta’s LLaMA, and Anthropic’s Claude.

But this is only the beginning. According to a survey conducted in April 2023 by Arize, 53% of respondents are planning to implement LLMs within the next year or sooner. It’s evident that more groundbreaking language models will continue to emerge.

One exciting approach to incorporating LLMs is by creating “vertical” models that are tailored to a specific domain. This involves retraining existing LLMs with domain-specific knowledge, making them suitable for industries such as life sciences, pharmaceuticals, insurance, finance, and more.

However, deploying an LLM successfully will require careful execution.

While LLMs have proven to be powerful in gaining a competitive edge, their use has also raised concerns. One major issue is their tendency to generate incorrect information, also known as “hallucinations.” This problem can distract leaders from addressing crucial issues in the generation process of these outputs.

The Challenges of Training and Deploying an LLM

One significant roadblock in using LLMs is their high operating cost. This is due to the intensive computational demands required for training and running these language models (after all, they’re called large language models for a reason).

The excitement surrounding LLMs is undeniable, but developing and implementing them comes with several feasibility hurdles.

The high cost of hardware: The recommended hardware for running LLMs is expensive. For example, the popular H100 GPU from Nvidia, can cost up to $40,000 per chip on the secondary market. According to estimates, it would take around 6,000 chips to train a model as large as ChatGPT-3.5, resulting in a whopping $240 million on GPUs alone.
The cost of electricity: The power needed to train an LLM is estimated to be 10 gigawatt-hours (GWh), equivalent to the yearly electricity usage of 1,000 homes in the US. Once trained, the model’s power consumption may vary but can still be significant. For instance, running ChatGPT-3.5 is estimated to consume 1 GWh per day, equivalent to the daily energy consumption of 33,000 households.

Furthermore, the high power consumption poses a challenge for user experience, especially on portable devices. Heavy use of an LLM on a device can quickly drain its battery, which could hinder its adoption among consumers.

As we continue to explore the potential of large language models, it is essential to approach their development and use with careful consideration and attention to these feasibility challenges.