Meta drops Llama 4 models, with «mixture of experts»

The Llama 4 models take off to a great start with the benchmarks, runs on little hardware, and is really cost efficient. — The Llama 4 models seem tailor made for STEM benchmarks, runs on little hardware, and is very cost efficient. (Picture: LadyDragonflyCC, CC BY 2.0)

The latest models out of Meta, the Llama 4 Maverick and the Scout, aren’t reasoning models, but put in extra work by using a panel of «experts» instead.

The models are rolling out right now on WhatsApp, Messenger and Instagram Direct, where they are free for everyone with a membership.

Will answer «contentious» questions
According to TechCrunch, the new models will answer questions previous models wouldn’t, responding to «debated» questions and «contentious» prompts «without judgement:»

— We’re continuing to make Llama more responsive so that it answers more questions, can respond to a variety of different viewpoints, and doesn’t favor some views over others, a meta spokesperson told the website.

Best non-reasoning model, shines on cost
The Llama 4 generation score pretty high on the benchmarks, but loses out to higher performance reasoning models like Chatgpt 4.5 and Google’s Gemini 2.5 Advanced.

Where they do shine, though, is if you pair them against non-reasoning models, like DeepSeek, and factor in costs, which are at $0.19 to $0.49 for one million tokens.

Stunning token length for Scout
The maximum token lengths are 1 million for the Maverick and for Scout it’s a whooping 10 million.

A token is like broken down piece of a word, which means that at 10 million tokens, you get to scan documents of great lengths.

In comparison, running a 1M million tokens on GTP 4o will cost about $4,4 for arguably slightly lower performance.

The mixture of experts breaks down a task to a group of experts, and then puts it together again at the answer. (Picture: Meta)

Panel of experts «reasoning»
Then there are the panel of experts, which does show some kind of reasoning in the model after all.

The «mixture of experts»-architecture break down data processing task into smaller units and have «experts» solve them under a shared expert distributing the tasks and putting them together again to present a coherent answer to the user.

This is used in both training and in answering queries, and Meta says it allows the models to be trained and function much more cost efficiently.

Very low hardware requirements
The models also shine in the hardware they need to run compared to the performance they get. The Scout requires only a Nvidia H100 card to run smoothly, while the Maverick needs something more powerful, like the Nvidia DGX H100.

This means it’s within the reach of many small companies, with the H100 card costing only about $40, if you can get one.

Not available in the EU
The license for these models provides free use for anyone with less than 700 million users. Crossing this threshold means you need direct permission from Meta.

Also, the multimodal features and limited to the US in English, and none of the models are not available in the EU, possibly due to more stringent AI and data collection laws.

If you are outside of Europe, you can try them out on the web here. And you can download the models off of HuggingFace and from Meta.

Meta is also teasing another model, the Behemoth, with 2 trillion total training parameters, which would hit a like a bomb when launched, but is just announced as «in testing» for now. We’ll have to wait and see on that one.

UPDATE: The Llama 4 models have been played with in the wild for a few hours, and apparently some users are disappointeed with it’s emotional intelligence, storytelling and creativity, scoring just 36 on Longform Creative Writing test. That’s the lowest on record.

It does however excel on STEM benchmarks, according to Meta (have a look in their announcement post).

Read more: Metas announcement, TechCrunch, Engadget and The Verge.

Meta drops Llama 4 models, with «mixture of experts»

One thought on “Meta drops Llama 4 models, with «mixture of experts»”

Nvidia releases the Vera Rubin platform: three and a half times faster [updated]

Amodei officially says Anthropic won’t drop Pentagon safeguards

ByteDance’s Seedance 2.0 video generator goes viral, prompts warnings

Faux pas at Indian AI summit as Amodei and Altman refuse hands

Nvidia strikes «multi-year strategic partnership» with Meta for AI chips

OpenAI’s first device will reportedly be a pocket-sized AI speaker, due in 2027