Google Introduces Gemini

Today, Google made a series of announcements about a major AI breakthrough which is their largest and most capable AI model, called Gemini.

The news spanned the world of enterprise, developers and consumers, so I thought it would be helpful to summarize the main announcements and provide links to the most useful blog posts.

Today’s news

Today, Google announced Gemini — the most capable general AI model they have ever built. It is the result of large-scale collaborative efforts by teams across Google, including Google DeepMind and Google Research, and is their largest science and engineering project ever.

Google has optimized Gemini 1.0, our first version of the model, for three different sizes:

Gemini Ultra — their most capable and largest model for highly-complex tasks
Gemini Pro — their best model for scaling across a wide range of tasks
Gemini Nano — their most efficient model for on-device tasks

What is Gemini?

Gemini is a multimodal AI model. This means that it can generalize and seamlessly understand, operate across and combine different types of information, including:

Text
Images
Audio
Video
Coding languages

It’s also their most flexible model yet, able to efficiently run on everything from mobile devices to data centres. Gemini will significantly enhance the way developers and enterprise customers build and scale with AI.

Built on next-generation capabilities

Until now, the standard approach to creating multimodal models involved training separate components for different modalities and then stitching them together to roughly mimic some of this functionality. These models can sometimes be good at performing certain tasks like describing images, but struggle with more conceptual and complex reasoning.

So Google designed Gemini to be natively multimodal — pre-trained from the start on different modalities. Then they fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state-of-the-art in nearly every domain.

Learn more about Gemini’s capabilities and see how it works.

Benchmarking tests

Google has been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 academic benchmarks widely used in large language model research and development.

You can see more details in this technical whitepaper.

Making Gemini available to the world

Gemini 1.0 is now rolling out across a range of products and platforms:

For consumers

Starting today, Bard — using a fine-tuned version of Gemini Pro — will be available in English in more than 170 countries and territories. It will be far more capable at things like understanding and summarizing, reasoning, brainstorming, writing and planning. Google is enthusiastic about bringing Bard’s generative AI potential to Canadians soon;
Google is also bringing Gemini to Pixel. Pixel 8 Pro is the first smartphone engineered to run Gemini Nano, which is powering new features like Summarize in the Recorder app, and rolling out in Smart Reply in Gboard, starting with WhatsApp, with more messaging apps coming next year;
And in the coming months, Gemini will be available in more of their core products and services like Search, Ads, Chrome, and Duet AI.

For developers

Starting on December 13, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio and Vertex AI:
Google AI Studio is a free, web-based developer tool that helps developers and enterprise customers prototype and launch apps quickly with an API key;
When it’s time for a fully-managed AI platform, Vertex AI allows customization of Gemini with full data control and benefits from additional Google Cloud features for enterprise security, safety, privacy, and data governance and compliance.
Android developers will also be able to build with Gemini Nano, their most efficient model for on-device tasks, via AICore. AICore is a new system capability available in Android 14, starting on Pixel 8 Pro devices. Sign up for an early preview.
And as part of their extensive trust and safety checks for Gemini Ultra, they will make it available to select customers, developers and partners for early experimentation and feedback before making it broadly available to developers and enterprise customers early next year

Looking ahead

This is a significant milestone in the development of AI, and the start of a new era for Google as they continue to rapidly innovate and responsibly advance the capabilities of our models. They’ve made great progress on Gemini so far and they’re working hard to further extend its capabilities for future versions.

This entry was posted on December 6, 2023 at 11:05 am and is filed under Commentary with tags Google. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

The IT Nerd