Economical and Multimodal: OpenAI's GPT-4.1 and Meta's Llama 4

TLDR

OpenAI’s new GPT-4.1 family offers improved coding skills and instruction-following abilities while supporting one million token context windows. GPT-4.1 is 26% cheaper than GPT-4o and come with deeper agentic capabilities for tasks like multi-document analysis.
Meta’s new Llama 4 Scout and Maverick are natively multimodal, with Scout offering an unprecedented 10 million token window. The models outperform competitors like GPT-4.5 and Gemini 2.0 Pro on STEM benchmarks.
Meta said its Llama 4 models show less left-leaning bias, aligning closer to Elon Musk’s Grok AI chatbot in terms of content filtering.

OpenAI and Meta recently announced major upgrades to their flagship AI models: the GPT-4.1 family and Llama 4 collection, respectively.

OpenAI’s new GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano are more economical than previous models, with the nano version its cheapest and fastest model to date, the company said in a blog post. These models offer a one million token context window and have a knowledge cutoff date of June 2024.

GPT-4.1 shows particular strength in coding, scoring 54.6% on SWE-bench Verified – a 26.6 percentage point improvement over GPT-4.5. It also demonstrated enhanced instruction following capabilities, with a 10.5 percentage point increase over GPT-4o on Scale’s MultiChallenge benchmark, which measures an LLM’s ability to handle ongoing conversations with humans.

OpenAI said GPT-4.1 is 26% cheaper than GPT-4o for median queries. The startup said it is also raising its prompt caching discount to 75% from 50%. It also won’t charge extra for long context requests. The GPT-4.1 models are available only through its API.

OpenAI noted that enhanced instruction following reliability and long context comprehension make the GPT-4.1 models “considerably more effective at powering agents, or systems that can independently accomplish tasks on behalf of users.”

Early testing by companies provided concrete examples of the models’ capabilities. Thomson Reuters reported a 17% improvement in multi-document review accuracy with GPT-4.1 compared to GPT-4o, while investment firm Carlyle found 50% better performance on retrieving granular financial data from complex documents such as PDFs, Excel files and the like.

Llama 4’s 10 million token context window

Meanwhile, Meta unveiled open-source Llama 4 Scout and Llama 4 Maverick, describing them as “the first open-weight natively multimodal models with unprecedented context length support” and the company’s first models built using a mixture-of-experts (MoE) architecture.

Notably, Scout offers a 10 million token context window, which is a record. The previous record was held by Google’s Gemini, which was at one million going to two million.

The company also previewed Llama 4 Behemoth, which they claim outperforms competitors including GPT-4.5, Claude Sonnet 3.7 and Gemini 2.0 Pro on STEM-focused benchmarks.

Llama 4 Scout features 17 billion active parameters (109 billion total) with 16 experts, while Llama 4 Maverick has 17 billion active parameters (400 billion total) with 128 experts. According to Meta, Maverick can be run on a single Nvidia H100 DGX host, with Scout designed to fit on one H100 GPU.

Both models can integrating text and vision tokens through early fusion, and were pre-trained on more than 30 trillion tokens across diverse text, image and video datasets.

Meta also said it will remove bias in its AI models, with Llama 4 now performing similarly to Elon Musk’s Grok AI chatbot, which has fewer filters. For example, Llama 4 is more inclined to allow political and social debates than Llama 3.3.

Llama 4 Maverick “offers unparalleled, industry-leading performance in image and text understanding,” excelling at precise image comprehension and creative writing, according to Meta.

Both Llama 4 Scout and Maverick are available for download on Llama.com and Hugging Face, with Meta AI integrations in WhatsApp, Messenger, Instagram Direct and the Meta.AI website.

Author

Deborah Yao

View all posts

Economical and Multimodal: OpenAI’s GPT-4.1 and Meta’s Llama 4

Llama 4’s 10 million token context window

Author