Press "Enter" to skip to content
Credit: Google

Google’s Gemini 2.0 to Enable a World of AI Agents

Google today unveiled the latest and most powerful version of its flagship AI model, Gemini 2.0, as the AI wars continue to heat up.

Gemini 2.0, like its predecessor, is natively multimodal, meaning it was built from the start to take in different content formats (text, video, audio, code) instead of each capability being bolted on later. With improved performance, it can better enable AI agents, or bots that work behind the scenes by themselves to do tasks for the user.

Google released just one AI model size in the Gemini 2.0 family of models, for now: Gemini 2.0 Flash, which is smaller and faster with low latency. Google said Gemini 2.0 Flash even outperforms the full-size Gemini 1.5 Pro model on key benchmarks and at twice the speed.

Flash can now churn out multimodal results – not just receive them – and it also has multilingual audio. It can access Google Search, execute code and enable other functions from third parties.

Gemini 2.0 Flash can enable a “new class” of AI agent experiences, according to Google, due to its user interface being natively capable of action, improvements in multimodal reasoning, long context understanding (where AI agents can comprehend lengthy prompts), complex instruction-following and planning (it can perform tasks that require multiple steps), with less lag, among other features.

What does all this mean? With Gemini 2.0, Google is now closer to developing a full-fledged, universal AI assistant that can do many different types of tasks, answer all types of queries, engage in different content formats (photos, audio, video, code), has real-time access to the internet, among other capabilities. It is like having a super-smart and capable assistant with you at all times.

Improved Astra and the return of Google glasses

An example of this all-purpose AI assistant is Project Astra, which it first introduced in May. Google said it has been getting feedback from developers and the latest version was built using Gemini 2.0, leading to improvements including better dialogue, more tools (Google Search, Maps, Lens), more memory to remember your conversations (10-minute window) and less lag.

Google said it is testing prototype smart glasses to work with Astra. These are normal-looking glasses, not the sci-fi-looking version worn by co-founder Sergey Brin in 2012.

To power these glasses, Google introduced a new mobile operating system called Android XR. “We want there to be lots of choices of stylish, comfortable glasses you’ll love to wear every day,” Google’s XR vice president Shahram Izadi said in a blog post. Android XR will be an open, unified platform for mixed reality headsets and glasses.

Gemini 2.0 Flash is now available to developers as an experimental model via the Gemini API in Google AI Studio and Vertex AI (via Google Cloud).

Google unveiled Gemini 2.0 as rival OpenAI, the creator of ChatGPT, took aim at Google’s core search business. Last month, OpenAI introduced ChatGPT search and even rolled out a Chrome browser extension that lets users bypass Google Search entirely. OpenAI also just released its much-anticipated video-generation model, Sora.

Project Mariner and Jules

Google is not done innovating, yet. It is experimenting with a web browser that can do tasks on your behalf, not just take you to different websites or does searches for you. It is embedding AI agents into the browser. This research prototype is called Project Mariner.

The browser can understand and reason based on the information on your screen – including images, text, code, forms – and uses that information to do tasks for you through a Chrome extension. Google said Mariner garnered an 83.5% success rate for single agents on the WebVoyager benchmark.

How it works: Let’s say you want to look up email contact information for a list of companies you wrote down in Google Sheets. You can ask Mariner (in a chat space that opens up) to find it. The AI agent will read the list and search Google for these companies. It will then click on each company’s website and search for contact emails – then send them to you. During the task, the agent explains what it is doing in the chat space; you can also tell it to stop its activities at any time.

Google said Mariner is still in its early stages, and it is conducting research into what risks might arise. It notes that the AI agent will ask for final confirmation before doing certain tasks, like buying something. The search giant said testers are now putting Mariner through its paces.

Google also unveiled an AI agent for developers, called Jules. It integrates directly into GitHub; it can “tackle an issue, develop a plan and execute it,” with the developer’s approval. For example, developers can hand off bug-fixing tasks to Jules so they can concentrate on what they actually want to build.

Author

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *