Barely a week after launching the latest iteration of its Gemini models, Google today announced the launch of Gemma, a new family of lightweight open-weight models. Starting with Gemma 2B and Gemma 7B, these new models were inspired by Gemini
and are available for commercial and research usage.
Google did not provide us with a detailed paper on how these models perform against similar models from Meta and Mistral, for example, and only noted that they are state-of-the-art.
The company did note that these are dense decoder-only models, though, which is the same architecture it used for its Gemini models (and its earlier PaLM models) and that we will see the benchmarks later today on Hugging Face’s leaderboard.
To get started with Gemma, developers can get access to ready-to-use Colab and Kaggle notebooks, as well as integrations with Hugging Face, MaxText and Nvidia’s NeMo. Once pre-trained and tuned, these models can then run everywhere.
While Google highlights that these are open models, it’s worth noting that they are not open-source. Indeed, in a press briefing ahead of today’s announcement, Google’s Janine Banks stressed the company’s commitment to open source but also noted that Google is very intentional about how it refers to the Gemma models.
“[Open models] has become pretty pervasive now in the industry,” Banks said. “And it often refers to open weights models, where there is wide access for developers and researchers to customize and fine-tune models but, at the same time, the terms of use — things like redistribution, as well as ownership of those variants that are developed — vary based on the model’s own specific terms of use. And so we see some difference between what we would traditionally refer to as open source and we decided that it made the most sense to refer to our Gemma models as open models.”
That means developers can use the model for inferencing and fine-tune them at will and Google’s team argues that even though these model sizes are a good fit for a lot of use cases.
“The generation quality has gone significantly up in the last year,” Google DeepMind product management director Tris Warkentin said. “things that previously would have been the remit of extremely large models are now possible with state-of-the-art smaller models. This unlocks completely new ways of developing AI applications that we’re pretty excited about, including being able to run inference and do tuning on your local developer desktop or laptop with your RTX GPU or on a single host in GCP with Cloud TPUs, as well.”
That is true of the open models from Google’s competitors in this space as well, so we’ll have to see how the Gemma models perform in real-world scenarios.
In addition to the new models, Google is also releasing a new responsible generative AI toolkit to provide guidance and essential tools for creating safer AI applications with Gemma,
as well as a debugging tool.