Foundation AI Models Galore: GPT-4.5, Claude 3.7 Sonnet and Grok 3

TLDR

Major AI players are rapidly releasing next-gen models, including OpenAI’s GPT-4.5, xAI’s Grok 3, and Anthropic’s Claude 3.7 Sonnet – each bringing strengths in reasoning, creativity and interactivity.
GPT-4.5 pushes boundaries in scale and reliability, with a massive 12.8 trillion parameters and a drastically lower hallucination rate, though its improvements come at a significantly higher cost.
Models like Grok 3 and Claude 3.7 emphasize personalization and advanced reasoning, with features such as distinct voice personalities and chain-of-thought analysis

The AI revolution is not slowing down, with new AI models coming out like clockwork: OpenAI introduced its last non-reasoning LLM, GPT-4.5, xAI rolled out Grok 3 and Anthropic unveiled Claude 3.7 Sonnet. Amazon is reportedly preparing to release a reasoning model as well. Following its newfound fame, DeepSeek is readying release of R2, the second version of its reasoning model.

Here’s a look at some of the latest AI models:

GPT-4.5

OpenAI’s GPT-4.5 – nicknamed ‘Orion’ internally – is the startup’s most powerful, non-reasoning model to date. It will also be OpenAI’s last. However, GPT-4.5 ends the series with a bang: It is the largest model with a rumored 12.8 trillion parameters and hallucinates far less than other OpenAI models. Its hallucination rate is 37% compared with 61.8% for GPT-4o, 44% for o1 and 80.3% for o3-mini.

OpenAI said it developed GPT-4.5 with greater ‘EQ’ or emotional quotient, which makes interacting with the model more natural. Key improvements include enhanced pattern recognition, the ability to generate creative insights without explicit reasoning, and a broader knowledge base. The model achieved higher performance by scaling unsupervised learning.

A notable advancement in GPT-4.5 is the reduction in the rate of hallucinations, where AI models generate incorrect information. The hallucination rate has been lowered to 37.1%, a significant improvement from previous models like GPT-4o, which had a rate of 61.8%.

However, better performance comes at a steep price. The initial GPT-4 API pricing was: $30/million input tokens and $60/million output tokens. GPT-4.5 is at $75/million input and $150/million output.

Grok 3

Elon Musk’s xAI has unveiled Grok 3, its most powerful LLM yet. It was trained on the startup’s Colossus supercluster with 200,000 GPUs; Grok 2 was trained with half the number of GPUs. The result is superior performance across various benchmarks, including reasoning, mathematics, coding, and instruction-following tasks. Notably, Grok 3 achieved an Elo score of 1402 in the Chatbot Arena, showing that it’s more skilled in conversation than other chatbot.

True to Musk’s proclivities, this model distinguishes itself by having fewer filters, a deliberate move to set it apart from competitors like OpenAI’s ChatGPT and Google’s Gemini. Grok 3 introduces voice personalities such as ‘romantic,’ ‘sexy,’ and ‘unhinged,’ with the latter two labeled as ’18+,’ according to the FT. This strategy aims to attract users seeking less constrained AI interactions, aligning with Musk’s vision of a “maximally truth-seeking AI” but one that has raised concerns.

Claude 3.7 Sonnet

Anthropic’s Claude 3.7 Sonnet is a hybrid reasoning model capable of both rapid responses and extended, in-depth analysis. This dual capability allows the model to adapt its reasoning approach based on the complexity of the task at hand, offering users a versatile AI assistant.

Claude 3.7 Sonnet also offers an extended thinking mode, where the model engages in detailed ‘chain of thought’ reasoning. This mode enhances its performance in areas such as mathematics, physics, coding, and complex problem-solving. Users can control the extent of this reasoning, balancing speed, cost and quality to suit their specific needs.

Anthropic also introduced Claude Code, an agentic coding tool that actively collaborates with developers. Claude Code can search and read code, edit files, write and run tests, and commit and push code to GitHub, while keeping developers informed. This tool is useful for tasks such as test-driven development and large-scale refactoring, significantly reducing development time and effort.

Author

Deborah Yao

View all posts