Press "Enter" to skip to content
From left: OpenAI CEO Sam Altman and CPO Kevin Weil | Credit: Kyle Kabasares on YouTube

OpenAI’s Sam Altman: AI Agents to Up-level Human Activities Soon

TLDR

  • OpenAI CEO Sam Altman sees AI agents, bots that accomplish tasks like booking an Uber for a user, changing the world in a “very significant” way. It could be ubiquitous as soon as 2030.
  • During its developer conference, DevDay, Altman said OpenAI’s most powerful LLM, OpenAI o1, is also its safest due to its reasoning capabilities.
  • Altman’s favorite AI model from competitors? Text-to-podcast AI tool from Google called NotebookLM

OpenAI CEO Sam Altman believes that each person could soon be assisted by AI agents working in the background, which would significantly raise the level of human activity.

At the startup’s second annual developer’s conference called DevDay, Altman said he expects to see a “very significant change in the way the world works in a short amount of time.”

By 2030, he surmised, each person could have AI agents assisting them in multiple tasks, such that people could accomplish far more and much more quickly than before. So much so that he believes we will say, “This is what humans are supposed to be capable of.”

AI agents represent level 3 of five levels toward AGI, Altman added. Level 1 is conversational AI (chatbots), level 2 is reasoning AI, which Altman said OpenAI accomplished with OpenAI o1, its latest and most powerful large language model. It ‘thinks’ in a chain-of-thought process before it answers. Level 4 is innovators, or AI that can improve upon human processes automatically. Level 5 is organizations, where AI bots can run an entire organization without human intervention.

“We’ve clearly gotten to Level 2,” Altman said, calling its capabilities “impressive.”

By 2025, AI agents (level 3) will be more ubiquitous, added OpenAI Chief Product Officer Kevin Weil, who interviewed Altman during a fireside chat at DevDay.

But with AI agents taking action on your behalf, safety and alignment issues have become more critical than ever. (Alignment refers to AI systems being aligned with human values to avoid causing harm.)

Altman said despite what’s been said online, “we really do care a lot about building safe AI systems.” OpenAI approaches AI safety from both ends: addressing what is immediately in front of them and also what might be coming.

This is the way it approaches research too, Altman said. They are keen on creating new paradigms, pushing the frontier. “That’s what motivates us,” he added. “We love it.”

OpenAI o1, its reasoning and most powerful LLM, also is the startup’s “most aligned model by a lot,” Altman said.

Asked what’s his favorite AI model from his competitors, Altman said “NotebookLM,” which is Google’s wildly imaginative text-to-podcast model.

“This is really cool,” he said. “Very well done.”

Separately, The Information is reporting that SoftBank invested $500 million in OpenAI’s financing round that just closed. Apple reportedly backed out of the round.

OpenAI said in a blog post that it raised $6.6 billion at a valuation of $157 billion. “The new funding allows us to double down on our leadership in frontier AI research, increase compute capacity, and continue building tools that help people solve hard problems,” OpenAI said.

What OpenAI announced

At DevDay, OpenAI unveiled the following new tools and capabilities for developers:

  • Realtime API – The public beta release enables paying developers to build low-latency, multimodal experiences in their apps. It supports speech-to-speech conversations with six preset voices already supported in the API. Also unveiled are audio input and output capabilities in the Chat Completions API for use cases that don’t require low-latency environments. Input any text or audio into GPT-4o and get a choice of text, audio or both as responses.
  • Vision fine-tuning on GPT-4o – Developers can now fine-tune using images, in addition to text. Developers can make the model understand images better, enabling capabilities like enhanced visual search functionality.
  • Prompt caching in the API – It allows developers to reduce costs and latency by reusing recently seen input tokens. This translates to a 50% discount and faster prompt processing times.
  • Model distillation in the API – It gives developers an integrated workflow to manage the entire distillation pipeline within the OpenAI platform, so they can easily use the outputs of frontier models to fine-tune and improve the performance of less expensive models such as GPT-4o mini.

Author