Google Announces Public Preview of Gemini 1.5 Pro on Vertex AI
Las Vegas, NV — During its annual Cloud Next conference, Google revealed that the highly anticipated Gemini 1.5 Pro, the most advanced generative AI model from Google, is now available for public preview on Vertex AI, the company’s enterprise-focused AI development platform.
First launched in February, Gemini 1.5 Pro is the latest addition to Google’s Gemini family of generative AI models. One of its standout features is the exceptional amount of data it can process, ranging from 128,000 to 1 million tokens. In this context, “tokens” refers to subdivided bits of raw data, such as the syllables “fan,” “tas,” and “tic” in the word “fantastic.”
One million tokens is equivalent to around 700,000 words or around 30,000 lines of code. It’s about four times the amount of data that Anthropic’s flagship model, Claude 3, can take as input and about eight times as high as OpenAI’s GPT-4 Turbo max context.
The context, or context window, of a model refers to the initial set of data (e.g. text) the model considers before generating output (e.g. additional text). This can include a simple question, a movie script, an email, an essay, or even an e-book.
Models with smaller context windows tend to struggle with retaining and recalling information from recent conversations, causing them to veer off topic. However, models with larger context windows have the advantage of better understanding the flow of information, resulting in more contextually rich responses and potentially reducing the need for fine-tuning and fact checking.
The Power of a Million-Token Context
With a context window of one million tokens, Gemini 1.5 Pro boasts an impressive range of capabilities, according to Google. These include but are not limited to analyzing a code library, reasoning across lengthy documents, and even holding long conversations with chatbots.
Thanks to its multilingual and multimodal abilities, Gemini 1.5 Pro can also analyze and compare content in media such as TV shows, movies, radio broadcasts, conference call recordings, and more across different languages. In terms of time, one million tokens roughly translates to an hour of video or 11 hours of audio.
Additionally, Gemini 1.5 Pro can generate transcriptions for video clips, although the quality of these transcriptions is still being evaluated.
In a pre-recorded demo earlier this year, Google showed Gemini 1.5 Pro searching the transcript of the Apollo 11 moon landing telecast (which comes to about 400 pages) for quotes containing jokes, and then finding a scene in movie footage that looked similar to a pencil sketch.
Some early users of Gemini 1.5 Pro, including United Wholesale Mortgage, TBS, and Replit, have already reported successes in leveraging the large context window for various tasks. These include mortgage underwriting, automating metadata tagging on media archives, and generating, explaining, and transforming code.
Optimizing for Latency and Integration with Google Products
It’s worth noting that Gemini 1.5 Pro may not process a million tokens instantaneously. In previous demos, each search took between 20 seconds and a minute to complete, which is significantly longer than the average query on Google’s ChatGPT.
However, Google has stated that they are actively working on improving latency and optimizing Gemini 1.5 Pro over time.
In addition to public preview on Vertex AI, Gemini 1.5 Pro is also being integrated into other parts of Google’s corporate product ecosystem. As announced on Tuesday, the model (currently in private preview) will power new features in Code Assist, Google’s generative AI coding assistance tool. This will allow developers to make large-scale changes across codebases, such as updating cross-file dependencies and reviewing large chunks of code, with the help of Gemini 1.5 Pro’s capabilities.
[…] “Vertex AI Agent Builder makes it incredibly simple and fast for individuals to create conversational agents,” stated Google Cloud CEO Thomas Kurian. “Users are able to construct and deploy production-ready agents powered by generative AI, and instruct and guide them just as they would a human, enhancing the accuracy and quality of responses from the models.” […]