teknotum
Skip to content

Teknotum

Meta drops Llama 4 models, with «mixture of experts»

The Llama 4 models take off to a great start with the benchmarks, runs on little hardware, and is really cost efficient.
The Llama 4 models seem tailor made for STEM benchmarks, runs on little hardware, and is very cost efficient. (Picture: LadyDragonflyCC, CC BY 2.0)
The latest models out of Meta, the Llama 4 Maverick and the Scout, aren’t reasoning models, but put in extra work by using a panel of «experts» instead.

The models are rolling out right now on WhatsApp, Messenger and Instagram Direct, where they are free for everyone with a membership.

Will answer «contentious» questions
According to TechCrunch, the new models will answer questions previous models wouldn’t, responding to «debated» questions and «contentious» prompts «without judgement:»

— We’re continuing to make Llama more responsive so that it answers more questions, can respond to a variety of different viewpoints, and doesn’t favor some views over others, a meta spokesperson told the website.

Best non-reasoning model, shines on cost
The Llama 4 generation score pretty high on the benchmarks, but loses out to higher performance reasoning models like Chatgpt 4.5 and Google’s Gemini 2.5 Advanced.

Where they do shine, though, is if you pair them against non-reasoning models, like DeepSeek, and factor in costs, which are at $0.19 to $0.49 for one million tokens.

Stunning token length for Scout
The maximum token lengths are 1 million for the Maverick and for Scout it’s a whooping 10 million.

A token is like broken down piece of a word, which means that at 10 million tokens, you get to scan documents of great lengths.

In comparison, running a 1M million tokens on GTP 4o will cost about $4,4 for arguably slightly lower performance.

The mixture of experts breaks down a task to a group of experts, and then puts it together again at the answer.
The mixture of experts breaks down a task to a group of experts, and then puts it together again at the answer. (Picture: Meta)

Panel of experts «reasoning»
Then there are the panel of experts, which does show some kind of reasoning in the model after all.

The «mixture of experts»-architecture break down data processing task into smaller units and have «experts» solve them under a shared expert distributing the tasks and putting them together again to present a coherent answer to the user.

This is used in both training and in answering queries, and Meta says it allows the models to be trained and function much more cost efficiently.

Very low hardware requirements
The models also shine in the hardware they need to run compared to the performance they get. The Scout requires only a Nvidia H100 card to run smoothly, while the Maverick needs something more powerful, like the Nvidia DGX H100.

This means it’s within the reach of many small companies, with the H100 card costing only about $40, if you can get one.

Not available in the EU
The license for these models provides free use for anyone with less than 700 million users. Crossing this threshold means you need direct permission from Meta.

Also, the multimodal features and limited to the US in English, and none of the models are not available in the EU, possibly due to more stringent AI and data collection laws.

If you are outside of Europe, you can try them out on the web here. And you can download the models off of HuggingFace and from Meta.

Meta is also teasing another model, the Behemoth, with 2 trillion total training parameters, which would hit a like a bomb when launched, but is just announced as «in testing» for now. We’ll have to wait and see on that one.

UPDATE: The Llama 4 models have been played with in the wild for a few hours, and apparently some users are disappointeed with it’s emotional intelligence, storytelling and creativity, scoring just 36 on Longform Creative Writing test. That’s the lowest on record.

It does however excel on STEM benchmarks, according to Meta (have a look in their announcement post).

Read more: Metas announcement, TechCrunch, Engadget and The Verge.

Author Tor FosheimPosted on 6. April 20256. April 2025Tags AI, llama, meta

One thought on “Meta drops Llama 4 models, with «mixture of experts»”

  1. Pingback: Meta gamed this AI benchmark with new Llama 4 model – teknotum

Comments are closed.

Post navigation

Previous Previous post: Copilot labs show off new, playable Quake II levels
Next Next post: Meta gamed the LMArena AI benchmarks with new model

You might also like

ChatGPT debuts shopping and product reviews

Microsoft: 81% of SMBs see 2025 as pivotal year for AI at work

Google’s Gemini reaches 350 million monthly users

OpenAI expects positive cash flow, $125 billion in sales by 2029

Anthropic: Virtual employees will arrive next year

Meta AI’s new Llama 4 app has access to Facebook, Instagram

From the front page

Every chip designer will have a thousand AI agents, says Jensen Huang

07:45 18 May 2025

OpenAI debuts Codex, an AI coding agent, further disrupting the software industry

07:36 17 May 2025

Meta delays flagship Behemoth model due to performance issues

10:06 16 May 2025

Google unveils AlphaEvolve, an AI model for algorithm discovery

08:24 15 May 2025

ChatGPT 4.1 now available in the app and web

09:40 15 May 2025

Adobe AI airplanes anthropic apple bard biontech cancer chatgpt climate coding copyright defense drones energy facebook film game gemini google images instagram internet iphone llama media meta Microsoft military netflix nuclear openai playstation romfart science search sony sosiale medier streaming test TV twitter vaccines Xbox zuckerberg

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Teknotum Proudly powered by WordPress