Today : Apr 22, 2025
Technology
06 April 2025

Meta Launches Llama 4 Models Revolutionizing AI Technology

The new models promise advanced capabilities and multimodal performance for developers and enterprises.

Meta has officially launched its latest collection of artificial intelligence models, named the Llama 4 herd, on April 5, 2025. This new lineup introduces two flagship models, Llama 4 Scout and Llama 4 Maverick, and offers a preview of the yet-to-be-released Llama 4 Behemoth. The announcement marks a significant leap forward for Meta in the competitive AI landscape.

Llama 4 Scout is a compact model featuring 17 billion active parameters and 16 experts, designed to operate efficiently on a single NVIDIA H100 GPU using Int4 quantization. It boasts an impressive context window of 10 million tokens, allowing it to perform complex tasks such as multi-document summarization and reasoning over extensive codebases. According to Meta, Llama 4 Scout outperforms not only its predecessors but also rivals like Google’s Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across various widely reported benchmarks.

Meanwhile, Llama 4 Maverick, which also contains 17 billion active parameters, steps up the game with 128 experts and a total of 400 billion parameters. This model is engineered for top-tier multimodal performance, making it ideal for applications that require processing both text and visual data. Meta claims that Maverick surpasses leading competitors such as GPT-4o and Gemini 2.0 Flash on several benchmarks and achieves results comparable to the larger DeepSeek v3 in reasoning and coding tasks. Impressively, an experimental chat version of Maverick has already scored 1417 on LMArena, showcasing its capabilities in real-time applications.

The Llama 4 models employ a mixture-of-experts (MoE) architecture, a first for the Llama series. This innovative structure activates only a subset of total parameters per token, enhancing efficiency and performance. For context, Llama 4 Scout has a total of 109 billion parameters, while Maverick scales up to 400 billion.

In addition to its advancements, Meta is also developing the Llama 4 Behemoth, a massive teacher model with 288 billion active parameters and nearly two trillion total parameters. Although still in training, Meta reports that Behemoth is on track to outperform notable competitors like GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. Behemoth will play a crucial role in distilling knowledge to the Scout and Maverick models, although it is not yet available for public release.

Developers eager to utilize these cutting-edge AI models can download Llama 4 Scout and Maverick starting today from llama.com and Hugging Face. Additionally, users can experience Meta AI powered by Llama 4 through platforms such as WhatsApp, Messenger, Instagram Direct, and the Meta.AI website. More detailed insights and future plans for Llama 4 Behemoth will be shared at the upcoming LlamaCon conference on April 29, 2025.

In a related development, Groq, a pioneer in AI inference, has also launched the Llama 4 Scout and Maverick models on its GroqCloud™ platform, providing developers and enterprises with day-zero access to these advanced open-source AI models. Groq's infrastructure enables these models to go live without delays, tuning, or bottlenecks, resulting in the lowest cost per token in the industry.

Jonathan Ross, CEO and Founder of Groq, stated, "We built Groq to drive the cost of compute to zero. Our chips are designed for inference, which means developers can run models like Llama 4 faster, cheaper, and without compromise." The pricing for Llama 4 models on GroqCloud is as follows: Llama 4 Scout costs $0.11 per million input tokens and $0.34 per million output tokens, with a blended rate of $0.13. Llama 4 Maverick is priced at $0.50 per million input tokens and $0.77 per million output tokens, resulting in a blended rate of $0.53.

Groq’s platform allows developers to run cutting-edge multimodal workloads while keeping costs low and latency predictable. The Llama 4 Scout model is noted for its strong general-purpose capabilities, particularly in summarization, reasoning, and coding, achieving speeds of over 460 tokens per second on Groq. In contrast, Llama 4 Maverick is optimized for multilingual and multimodal tasks, making it suitable for chat, assistant, and creative applications.

As the AI landscape continues to evolve, Meta's Llama 4 models represent a significant advancement in open-source AI technology. Despite some criticism regarding the licensing restrictions—particularly the requirement for commercial entities with over 700 million monthly active users to seek permission from Meta before utilizing its models—the company maintains its commitment to open-source principles.

Meta’s approach to AI development, particularly with the introduction of the MoE architecture, suggests a shift towards more efficient and powerful models that can handle a wide range of applications. The upcoming LlamaCon will likely provide further insights into Meta's future directions and innovations in AI technology.

In summary, the launch of the Llama 4 models by Meta and their immediate availability on GroqCloud marks a pivotal moment in the AI industry. As developers and enterprises explore these new capabilities, the potential applications for Llama 4 Scout and Maverick are vast, promising to enhance everything from chatbots to complex data analysis.