Summarize

Meta training next-gen AI model on industry's largest GPU cluster

By Akash Pandey

Oct 31, 2024

12:10 pm

What's the story

Meta CEO Mark Zuckerberg has revealed that the company's upcoming Llama 4 AI model is being trained on an unprecedentedly large cluster of GPUs.

Speaking during an earnings call, Zuckerberg said that Llama 4's development is well underway and its initial launch is expected early next year.

"We're training the Llama 4 models on a cluster that is bigger than 100,000 H100s, or bigger than anything that I've seen reported for what others are doing," he said.

AI advancements

Leader in AI training scale

The scale of AI training is deemed critical for developing more sophisticated AI models.

As of now, Meta appears to be ahead in this department, with many giants probably looking to employ compute clusters of over 100,000 high-end chips.

Earlier this year, Meta and NVIDIA revealed details of clusters of some 25,000 H100s employed for Llama 3 development.

Meanwhile, Elon Musk's xAI venture partnered with X and NVIDIA to establish a cluster of 100,000 H100s.

Open-source strategy

Meta's unique approach to AI development

Meta's approach to AI development is quite different, as it allows Llama models to be fully downloaded for free.

This is unlike models from OpenAI, Google, and most other big players that only provide access through an API.

Although Meta calls it "open source," the Llama license does place some restrictions on commercial use and doesn't reveal details of the models' training.

Resource demands

Llama 4 development poses engineering challenges

The development of Llama 4 will likely come with unique engineering challenges and require a lot of energy.

According to estimates by SemiAnalysis, a cluster of 100,000 H100 chips requires around 150 megawatts of power, five times the power consumed by El Capitan, the largest national lab supercomputer in the US.

Despite these challenges, Meta plans to invest up to $40 billion in capital this year on data centers and other infrastructure.

AI concerns

Meta's open-source AI strategy stirs debate

Meta's open-source approach to AI has sparked debate among experts.

Some are concerned that freely available, powerful AI models could be misused for cyberattacks or to automate the design of chemical or biological weapons.

Despite these concerns, Zuckerberg remains confident about the open source strategy.

He believes that "open source will be the most cost effective, customizable, trustworthy, performant, and easiest to use option that is available to developers."