AI

Meta releases its biggest ‘open’ AI model yet

Comment

people walking past Meta signage
Image Credits: TOBIAS SCHWARZ/AFP / Getty Images

Meta’s latest open source AI model is its biggest yet.

Today, Meta said it is releasing Llama 3.1 405B, a model containing 405 billion parameters. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

At 405 billion parameters, Llama 3.1 405B isn’t the absolute largest open source model out there, but it’s the biggest in recent years. Trained using 16,000 Nvidia H100 GPUs, it also benefits from newer training and development techniques that Meta claims makes it competitive with leading proprietary models like OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet (with a few caveats).

As with Meta’s previous models, Llama 3.1 405B is available to download or use on cloud platforms like AWS, Azure and Google Cloud. It’s also being used on WhatsApp and Meta.ai, where it’s powering a chatbot experience for U.S.-based users.

New and improved

Like other open and closed source generative AI models, Llama 3.1 405B can perform a range of different tasks, from coding and answering basic math questions to summarizing documents in eight languages (English, German, French, Italian, Portuguese, Hindi, Spanish and Thai). It’s text-only, meaning that it can’t, for example, answer questions about an image, but most text-based workloads — think analyzing files like PDFs and spreadsheets — are within its purview.

Meta wants to make it known that it is experimenting with multimodality. In a paper published today, researchers at the company write that they’re actively developing Llama models that can recognize images and videos, and understand (and generate) speech. Still, these models aren’t yet ready for public release.

To train Llama 3.1 405B, Meta used a dataset of 15 trillion tokens dating up to 2024 (tokens are parts of words that models can more easily internalize than whole words, and 15 trillion tokens translates to a mind-boggling 750 billion words). It’s not a new training set per se, since Meta used the base set to train earlier Llama models, but the company claims it refined its curation pipelines for data and adopted “more rigorous” quality assurance and data filtering approaches in developing this model.

The company also used synthetic data (data generated by other AI models) to fine-tune Llama 3.1 405B. Most major AI vendors, including OpenAI and Anthropic, are exploring applications of synthetic data to scale up their AI training, but some experts believe that synthetic data should be a last resort due to its potential to exacerbate model bias.

For its part, Meta insists that it “carefully balance[d]” Llama 3.1 405B’s training data, but declined to reveal exactly where the data came from (outside of webpages and public web files). Many generative AI vendors see training data as a competitive advantage and so keep it and any information pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive for companies to reveal much. 

Meta Llama 3.1
Image Credits: Meta

In the aforementioned paper, Meta researchers wrote that compared to earlier Llama models, Llama 3.1 405B was trained on an increased mix of non-English data (to improve its performance on non-English languages), more “mathematical data” and code (to improve the model’s mathematical reasoning skills), and recent web data (to bolster its knowledge of current events).

Recent reporting by Reuters revealed that Meta at one point used copyrighted e-books for AI training despite its own lawyers’ warnings. The company controversially trains its AI on Instagram and Facebook posts, photos and captions, and makes it difficult for users to opt out. What’s more, Meta, along with OpenAI, is the subject of an ongoing lawsuit brought by authors, including comedian Sarah Silverman, over the companies’ alleged unauthorized use of copyrighted data for model training.

“The training data, in many ways, is sort of like the secret recipe and the sauce that goes into building these models,” Ragavan Srinivasan, VP of AI program management at Meta, told TechCrunch in an interview. “And so from our perspective, we’ve invested a lot in this. And it is going to be one of these things where we will continue to refine it.”

Bigger context and tools

Llama 3.1 405B has a larger context window than previous Llama models: 128,000 tokens, or roughly the length of a 50-page book. A model’s context, or context window, refers to the input data (e.g. text) that the model considers before generating output (e.g. additional text).

One of the advantages of models with larger contexts is that they can summarize longer text snippets and files. When powering chatbots, such models are also less likely to forget topics that were recently discussed.

Two other new, smaller models Meta unveiled today, Llama 3.1 8B and Llama 3.1 70B — updated versions of the company’s Llama 3 8B and Llama 3 70B models released in April — also have 128,000-token context windows. The previous models’ contexts topped out at 8,000 tokens, which makes this upgrade fairly substantial — assuming the new Llama models can effectively reason across all that context.

Meta Llama 3.1
Image Credits: Meta

All of the Llama 3.1 models can use third-party tools, apps and APIs to complete tasks, like rival models from Anthropic and OpenAI. Out of the box, they’re trained to tap Brave Search to answer questions about recent events, the Wolfram Alpha API for math- and science-related queries, and a Python interpreter for validating code. In addition, Meta claims the Llama 3.1 models can use certain tools they haven’t seen before — to an extent.

Building an ecosystem

If benchmarks are to be believed (not that benchmarks are the end-all be-all in generative AI), Llama 3.1 405B is a very capable model indeed. That’d be a good thing, considering some of the painfully obvious limitations of previous-generation Llama models.

Llama 3 405B performs on par with OpenAI’s GPT-4, and achieves “mixed results” compared to GPT-4o and Claude 3.5 Sonnet, per human evaluators that Meta hired, the paper notes. While Llama 3 405B is better at executing code and generating plots than GPT-4o, its multilingual capabilities are overall weaker, and Llama 3 405B trails Claude 3.5 Sonnet in programming and general reasoning.

And because of its size, it needs beefy hardware to run. Meta recommends at least a server node.

That’s perhaps why Meta’s pushing its smaller new models, Llama 3.1 8B and Llama 3.1 70B, for general-purpose applications like powering chatbots and generating code. Llama 3.1 405B, the company says, is better reserved for model distillation — the process of transferring knowledge from a large model to a smaller, more efficient model — and generating synthetic data to train (or fine-tune) alternative models.

To encourage the synthetic data use case, Meta said it has updated Llama’s license to let developers use outputs from the Llama 3.1 model family to develop third-party AI generative models (whether that’s a wise idea is up for debate). Importantly, the license still constrains how developers can deploy Llama models: App developers with more than 700 million monthly users must request a special license from Meta that the company will grant on its discretion.

Meta Llama 3.1
Image Credits: Meta

That change in licensing around outputs, which allays a major criticism of Meta’s models within the AI community, is a part of the company’s aggressive push for mindshare in generative AI.

Alongside the Llama 3.1 family, Meta is releasing what it’s calling a “reference system” and new safety tools — several of these block prompts that might cause Llama models to behave in unpredictable or undesirable ways — to encourage developers to use Llama in more places. The company is also previewing and seeking comment on the Llama Stack, a forthcoming API for tools that can be used to fine-tune Llama models, generate synthetic data with Llama and build “agentic” applications — apps powered by Llama that can take action on a user’s behalf.

“[What] We have heard repeatedly from developers is an interest in learning how to actually deploy [Llama models] in production,” Srinivasan said. “So we’re trying to start giving them a bunch of different tools and options.”

Play for market share

In an open letter published this morning, Meta CEO Mark Zuckerberg lays out a vision for the future in which AI tools and models reach the hands of more developers around the world, ensuring people have access to the “benefits and opportunities” of AI.

It’s couched very philanthropically, but implicit in the letter is Zuckerberg’s desire that these tools and models be of Meta’s making.

Meta’s racing to catch up to companies like OpenAI and Anthropic, and it is employing a tried-and-true strategy: give tools away for free to foster an ecosystem and then slowly add products and services, some paid, on top. Spending billions of dollars on models that it can then commoditize also has the effect of driving down Meta competitors’ prices and spreading the company’s version of AI broadly. It also lets the company incorporate improvements from the open source community into its future models.

Llama certainly has developers’ attention. Meta claims Llama models have been downloaded over 300 million times, and more than 20,000 Llama-derived models have been created so far.

Make no mistake, Meta’s playing for keeps. It is spending millions on lobbying regulators to come around to its preferred flavor of “open” generative AI. None of the Llama 3.1 models solve the intractable problems with today’s generative AI tech, like its tendency to make things up and regurgitate problematic training data. But they do advance one of Meta’s key goals: becoming synonymous with generative AI.

There are costs to this. In the research paper, the co-authors — echoing Zuckerberg’s recent comments — discuss energy-related reliability issues with training Meta’s ever-growing generative AI models.

“During training, tens of thousands of GPUs may increase or decrease power consumption at the same time, for example, due to all GPUs waiting for checkpointing or collective communications to finish, or the startup or shutdown of the entire training job,” they write. “When this happens, it can result in instant fluctuations of power consumption across the data center on the order of tens of megawatts, stretching the limits of the power grid. This is an ongoing challenge for us as we scale training for future, even larger Llama models.”

One hopes that training those larger models won’t force more utilities to keep old coal-burning power plants around.

More TechCrunch

Rocket Lab surpassed $100 million in quarterly revenue for the first time, a 71% increase from the same quarter of last year. This is just one of several shiny accomplishments…

Rocket Lab’s sunny outlook bodes well for future constellation plans 

In 1996, two companies, Patersons HR and Payroll Solutions, formed a venture called CloudPay to provide payroll and payments services to enterprise clients. CloudPay grew quietly over the next several…

CloudPay, a payroll services provider, lands $120M in new funding

The vulnerabilities allowed one security researcher to peek inside the leak sites without having to log in.

Security bugs in ransomware leak sites helped save six companies from paying hefty ransoms

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

A comprehensive list of 2024 tech layoffs

A new “beta rabbit” mode adds some conversational AI chops to the Rabbit r1, particularly in more complex or multi-step instructions.

Rabbit’s r1 refines chats and timers, but its app-using ‘action model’ is still MIA

Los Angeles is notorious for its back-to-back traffic. Three events that promise to bring in millions of spectators from around the world — the 2026 World Cup, the Super Bowl…

Archer to set up air taxi network in LA by 2026 ahead of World Cup

Featured Article

Amazon is fumbling in India

Amazon’s decision to overlook quick-commerce in India is now looking like a significant misstep.

Amazon is fumbling in India

OpenAI’s GPT-4o, the generative AI model that powers the recently launched alpha of Advanced Voice Mode in ChatGPT, is the company’s first trained on voice as well as text and…

OpenAI finds that GPT-4o does some truly bizarre stuff sometimes

On Thursday, Box filled in a missing piece on its AI platform when it bought automated metadata extracting startup, Alphamoon.

Box adds crucial piece to its AI platform with Alphamoon acquisition

OpenAI has announced a new appointment to its board of directors: Zico Kolter. Kolter, a professor and director of the machine learning department at Carnegie Mellon, predominantly focuses his research…

OpenAI adds a Carnegie Mellon professor to its board of directors

Count Spotify and Epic Games among the Apple critics who are not happy with the iPhone maker’s newly revised compliance plan for the European Union’s Digital Markets Act (DMA). Shortly…

Spotify and Epic Games call Apple’s revised DMA compliance plan ‘confusing,’ ‘illegal’ and ‘unacceptable’

Thursday seeks to shake up conventional online dating in a crowded market. The app, which recently expanded to San Francisco, fosters intentional dating by restricting user access to Thursdays. At…

Thursday, the dating app that you can use only on Thursdays, expands to San Francisco

AI companies are gobbling up investor money and securing sky-high valuations early in their life cycle. This dynamic has many calling the AI industry a bubble. Nick Frosst, a co-founder…

Cohere co-founder Nick Frosst thinks everyone needs to be more realistic about what AI can and cannot do

Instagram is rolling out the ability for users to add up to 20 photos or videos to their feed carousels, as the platform embraces the trend of “photo dumps.” Back…

Instagram is embracing the ‘photo dump’

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Anyone paying…

Lyft ‘opens a can of whoop ass’ on surge pricing, Tesla’s Dojo explained and Saudi Arabia pumps $1.5B into Lucid

Flint Capital just closed its third fund at $160 million. Its has a unique strategy for finding its limited partner investors. 

Flint Capital raises a $160M through an unusual fund-raising strategy

Earlier this week it emerged that the DPC had instigated court proceedings seeking an injunction against X over the data processing without consent.

Elon Musk’s X agrees to pause EU data processing for training Grok

During testing, Google DeepMind’s table tennis bot was able to beat all of the beginner-level players it faced.

Google DeepMind develops a ‘solidly amateur’ table tennis robot

The X account announced that its Premium+ subscription would now be “fully” ad-free, leading some to question how this change would affect creator earnings.

As X sues advertisers over boycott, the app ditches all ads from its top subscription tier

Apple has further revised its compliance plan for the European Union’s Digital Markets Act (DMA) rulebook, which, since March, has forced it to give iOS developers more freedom over how…

Apple revises DMA compliance for App Store link-outs, applying fewer restrictions and a new fee structure

The rise of neobanks has been fascinating to witness, as a number of companies in recent years have grown from merely challenging traditional banks to being massive players in and…

Chime and Dave execs are coming to TechCrunch Disrupt 2024

If you visited the Wikipedia website on mobile this week, you might have seen a pop-up indicating that dark mode is ready for prime time.

How to enable Wikipedia’s dark mode

The home security company says attackers accessed databases containing customer home addresses, email addresses, and phone numbers.

Home security giant ADT says it was hacked

The Looking Glass Pro has a 6-inch display and a foldable base. It shows spatial images like those created with the Apple Vision Pro and iPhone 15 Pro.

Looking Glass’ new lineup includes a $300 phone-sized holographic display

TikTok’s latest offering is capitalizing on the app’s ability to serve as a discovery engine for other media — something its users already take advantage of by sharing short clips…

TikTok partners with Warner Bros. to become a discovery engine for TV and movies

Cocoon is a new startup built on the belief that greener steel production and the creation of concrete slag doesn’t have to be an either/or proposition.

Cocoon is transforming steel production runoff into a greener cement alternative

SoundHound, an AI company that makes voice interface tech used by car companies, restaurants and tech firms, is doubling down on enterprise services by playing consolidator in a crowded market.…

SoundHound acquires Amelia AI for $80M after it raised $189M+

Seeking mental health support is a complex process, but some founders believe that using AI to formalize techniques like cognitive behavioral therapy (CBT) can help folks who might not have…

Feeling Great’s new therapy app translates its psychiatrist co-founder’s experience into AI

The U.K.’s antitrust regulator has confirmed that it’s carrying out a formal antitrust investigation into Amazon’s ties with Anthropic, after Amazon recently completed a $4 billion investment into the AI startup.…

UK launches formal probe into Amazon’s ties with AI startup Anthropic

Bardeen has raised $3M to build its platform that uses a natural language interface to automate repetitive knowledge work.

AI business agent startup Bardeen pulls in strategic investment from Dropbox and HubSpot