Transportation

Tesla Dojo: Elon Musk’s big plan to build an AI supercomputer, explained

Comment

Image Credits: Bryce Durbin | TechCrunch

For years, Elon Musk has talked about Dojo — the AI supercomputer that will be the cornerstone of Tesla’s AI ambitions. It’s important enough to Musk that he recently said the company’s AI team is going to “double down” on Dojo as Tesla gears up to reveal its robotaxi in October. 

But what exactly is Dojo? And why is it so critical to Tesla’s long-term strategy?

In short: Dojo is Tesla’s custom-built supercomputer that’s designed to train its “Full Self-Driving” neural networks. Beefing up Dojo goes hand-in-hand with Tesla’s goal to reach full self-driving and bring a robotaxi to market. FSD, which is on almost 2 million Tesla vehicles today, can perform some automated driving tasks but still requires a human to be attentive behind the wheel. 

Tesla delayed the reveal of its robotaxi, which was slated for August, to October, but both Musk’s public rhetoric and information from sources inside Tesla tell us that the goal of autonomy isn’t going away.

And Tesla appears poised to spend big on AI and Dojo to reach that feat. 

Tesla’s Dojo backstory

Image Credits: SUZANNE CORDEIRO/AFP via Getty Images / Getty Images

Musk doesn’t want Tesla to be just an automaker, or even a purveyor of solar panels and energy storage systems. Instead, he wants Tesla to be an AI company, one that has cracked the code to self-driving cars by mimicking human perception. 

Most other companies building autonomous vehicle technology rely on a combination of sensors to perceive the world — like lidar, radar and cameras — as well as high-definition maps to localize the vehicle. Tesla believes it can achieve fully autonomous driving by relying on cameras alone to capture visual data and then use advanced neural networks to process that data and make quick decisions about how the car should behave. 

As Tesla’s former head of AI, Andrej Karpathy, said at the automaker’s first AI Day in 2021, the company is basically trying to build “a synthetic animal from the ground up.” (Musk had been teasing Dojo since 2019, but Tesla officially announced it at AI Day.)

Companies like Alphabet’s Waymo have commercialized Level 4 autonomous vehicles — which the SAE defines as a system that can drive itself without the need for human intervention under certain conditions — through a more traditional sensor and machine learning approach. Tesla has still yet to produce an autonomous system that doesn’t require a human behind the wheel. 

About 1.8 million people have paid the hefty subscription price for Tesla’s FSD, which currently costs $8,000 and has been priced as high as $15,000. The pitch is that Dojo-trained AI software will eventually be pushed out to Tesla customers via over-the-air updates. The scale of FSD also means Tesla has been able to rake in millions of miles worth of video footage that it uses to train FSD. The idea there is that the more data Tesla can collect, the closer the automaker can get to actually achieving full self-driving. 

However, some industry experts say there might be a limit to the brute force approach of throwing more data at a model and expecting it to get smarter. 

“First of all, there’s an economic constraint, and soon it will just get too expensive to do that,” Anand Raghunathan, Purdue University’s Silicon Valley professor of electrical and computer engineering, told TechCrunch. Further, he said, “Some people claim that we might actually run out of meaningful data to train the models on. More data doesn’t necessarily mean more information, so it depends on whether that data has information that is useful to create a better model, and if the training process is able to actually distill that information into a better model.” 

Raghunathan said despite these doubts, the trend of more data appears to be here for the short-term at least. And more data means more compute power needed to store and process it all to train Tesla’s AI models. That is where Dojo, the supercomputer, comes in. 

What is a supercomputer?

Dojo is Tesla’s supercomputer system that’s designed to function as a training ground for AI, specifically FSD. The name is a nod to the space where martial arts are practiced. 

A supercomputer is made up of thousands of smaller computers called nodes. Each of those nodes has its own CPU (central processing unit) and GPU (graphics processing unit). The former handles overall management of the node, and the latter does the complex stuff, like splitting tasks into multiple parts and working on them simultaneously. GPUs are essential for machine learning operations like those that power FSD training in simulation. They also power large language models, which is why the rise of generative AI has made Nvidia the most valuable company on the planet. 

Even Tesla buys Nvidia GPUs to train its AI (more on that later). 

Why does Tesla need a supercomputer?

Tesla’s vision-only approach is the main reason Tesla needs a supercomputer. The neural networks behind FSD are trained on vast amounts of driving data to recognize and classify objects around the vehicle and then make driving decisions. That means that when FSD is engaged, the neural nets have to collect and process visual data continuously at speeds that match the depth and velocity recognition capabilities of a human. 

In other words, Tesla means to create a digital duplicate of the human visual cortex and brain function. 

To get there, Tesla needs to store and process all the video data collected from its cars around the world and run millions of simulations to train its model on the data. 

Tesla appears to rely on Nvidia to power its current Dojo training computer, but it doesn’t want to have all its eggs in one basket — not least because Nvidia chips are expensive. Tesla also hopes to make something better that increases bandwidth and decreases latencies. That’s why the automaker’s AI division decided to come up with its own custom hardware program that aims to train AI models more efficiently than traditional systems. 

At that program’s core is Tesla’s proprietary D1 chips, which the company says are optimized for AI workloads. 

Tell me more about these chips

Ganesh Venkataramanan, former senior director of Autopilot hardware, presenting the D1 training tile at Tesla’s 2021 AI Day.
Ganesh Venkataramanan, former senior director of Autopilot hardware, presenting the D1 training tile at Tesla’s 2021 AI Day.
Image Credits: Tesla/screenshot of streamed event

Tesla is of a similar opinion to Apple in that it believes hardware and software should be designed to work together. That’s why Tesla is working to move away from the standard GPU hardware and design its own chips to power Dojo. 

Tesla unveiled its D1 chip, a silicon square the size of a palm, on AI Day in 2021. The D1 chip entered into production as of at least May this year. The Taiwan Semiconductor Manufacturing Company (TSMC) is manufacturing the chips using 7 nanometer semiconductor nodes. The D1 has 50 billion transistors and a large die size of 645 millimeters squared, according to Tesla. This is all to say that the D1 promises to be extremely powerful and efficient and to handle complex tasks quickly. 

“We can do compute and data transfers simultaneously, and our custom ISA, which is the instruction set architecture, is fully optimized for machine learning workloads,” said Ganesh Venkataramanan, former senior director of Autopilot hardware, at Tesla’s 2021 AI Day. “This is a pure machine learning.”

The D1 is still not as powerful as Nvidia’s A100 chip, though, which is also manufactured by TSMC using a 7 nanometer process. The A100 contains 54 billion transistors and has a die size of 826 square millimeters, so it performs slightly better than Tesla’s D1. 

To get a higher bandwidth and higher compute power, Tesla’s AI team fused 25 D1 chips together into one tile to function as a unified computer system. Each tile has a compute power of 9 petaflops and 36 terabytes per second of bandwidth, and contains all the hardware necessary for power, cooling and data transfer. You can think of the tile as a self-sufficient computer made up of 25 smaller computers. Six of those tiles make up one rack, and two racks make up a cabinet. Ten cabinets make up an ExaPOD. At AI Day 2022, Tesla said Dojo would scale by deploying multiple ExaPODs. All of this together makes up the supercomputer. 

Tesla is also working on a next-gen D2 chip that aims to solve information flow bottlenecks. Instead of connecting the individual chips, the D2 would put the entire Dojo tile onto a single wafer of silicon. 

Tesla hasn’t confirmed how many D1 chips it has ordered or expects to receive. The company also hasn’t provided a timeline for how long it will take to get Dojo supercomputers running on D1 chips. 

In response to a June post on X that said: “Elon is building a giant GPU cooler in Texas,” Musk replied that Tesla was aiming for “half Tesla AI hardware, half Nvidia/other” over the next 18 months or so. The “other” could be AMD chips, per Musk’s comment in January

What does Dojo mean for Tesla?

Tesla’s humanoid robot Optimus Prime II at WAIC in Shanghai, China, on July 7, 2024.
Image Credits: Costfoto/NurPhoto / Getty Images

Taking control of its own chip production means that Tesla might one day be able to quickly add large amounts of compute power to AI training programs at a low cost, particularly as Tesla and TSMC scale up chip production. 

It also means that Tesla may not have to rely on Nvidia’s chips in the future, which are increasingly expensive and hard to secure. 

During Tesla’s second-quarter earnings call, Musk said that demand for Nvidia hardware is “so high that it’s often difficult to get the GPUs.” He said he was “quite concerned about actually being able to get steady GPUs when we want them, and I think this therefore requires that we put a lot more effort on Dojo in order to ensure that we’ve got the training capability that we need.” 

That said, Tesla is still buying Nvidia chips today to train its AI. In June, Musk posted on X

Of the roughly $10B in AI-related expenditures I said Tesla would make this year, about half is internal, primarily the Tesla-designed AI inference computer and sensors present in all of our cars, plus Dojo. For building the AI training superclusters, Nvidia hardware is about 2/3 of the cost. My current best guess for Nvidia purchases by Tesla are $3B to $4B this year.

“Inference compute” refers to the AI computations performed by Tesla cars in real time and is separate from the training compute that Dojo is responsible for.

Dojo is a risky bet, one that Musk has hedged several times by saying that Tesla might not succeed. 

In the long run, Tesla could theoretically create a new business model based on its AI division. Musk has said that the first version of Dojo will be tailored for Tesla computer vision labeling and training, which is great for FSD and for training Optimus, Tesla’s humanoid robot. But it wouldn’t be useful for much else. 

Musk has said that future versions of Dojo will be more tailored to general-purpose AI training. One potential problem with that is almost all AI software out there has been written to work with GPUs. Using Dojo to train general-purpose AI models would require rewriting the software. 

That is, unless Tesla rents out its compute, similar to how AWS and Azure rent out cloud computing capabilities. Musk also noted during Q2 earnings that he sees “a path to being competitive with Nvidia with Dojo.”

A September 2023 report from Morgan Stanley predicted that Dojo could add $500 billion to Tesla’s market value by unlocking new revenue streams in the form of robotaxis and software services. 

In short, Dojo’s chips are an insurance policy for the automaker, but one that could pay dividends. 

How far along is Dojo?

Nvidia CEO Jensen Huang and Tesla CEO Elon Musk at the GPU Technology Conference in San Jose, California.
Image Credits: Kim Kulish/Corbis via Getty Images / Getty Images

Reuters reported last year that Tesla began production on Dojo in July 2023, but a June 2023 post from Musk suggested that Dojo had been “online and running useful tasks for a few months.”

Around the same time, Tesla said it expected Dojo to be one of the top five most powerful supercomputers by February 2024 — a feat that has yet to be publicly disclosed, leaving us doubtful that it has occurred.

The company also said it expects Dojo’s total compute to reach 100 exaflops in October 2024. (One exaflops is equal to 1 quintillion computer operations per second. To reach 100 exaflops, and assuming that one D1 can achieve 362 teraflops, Tesla would need more than 276,000 D1s, or around 320,500 Nvidia A100 GPUs.)

Tesla also pledged in January 2024 to spend $500 million to build a Dojo supercomputer at its gigafactory in Buffalo, New York.

In May 2024, Musk noted that the rear portion of Tesla’s Austin gigafactory will be reserved for a “super dense, water-cooled supercomputer cluster.”

Just after Tesla’s second-quarter earnings call, Musk posted on X that the automaker’s AI team is using Tesla HW4 AI computer (renamed AI4), which is the hardware that lives on Tesla vehicles, in the training loop with Nvidia GPUs. He noted that the breakdown is roughly 90,000 Nvidia H100s plus 40,000 AI4 computers. 

“And Dojo 1 will have roughly 8k H100-equivalent of training online by end of year,” he continued. “Not massive, but not trivial either.”

More TechCrunch

Ola Electric, India’s largest electric two-wheeler maker, saw its shares rise as much as 20% on its public debut on Friday, making it the biggest listing among Indian firms in…

Ola Electric surges in India’s biggest listing in two years

Rocket Lab surpassed $100 million in quarterly revenue for the first time, a 71% increase from the same quarter of last year. This is just one of several shiny accomplishments…

Rocket Lab’s sunny outlook bodes well for future constellation plans 

In 1996, two companies, Patersons HR and Payroll Solutions, formed a venture called CloudPay to provide payroll and payments services to enterprise clients. CloudPay grew quietly over the next several…

CloudPay, a payroll services provider, lands $120M in new funding

The vulnerabilities allowed one security researcher to peek inside the leak sites without having to log in.

Security bugs in ransomware leak sites helped save six companies from paying hefty ransoms

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

A comprehensive list of 2024 tech layoffs

A new “beta rabbit” mode adds some conversational AI chops to the Rabbit r1, particularly in more complex or multi-step instructions.

Rabbit’s r1 refines chats and timers, but its app-using ‘action model’ is still MIA

Los Angeles is notorious for its back-to-back traffic. Three events that promise to bring in millions of spectators from around the world — the 2026 World Cup, the Super Bowl…

Archer to set up air taxi network in LA by 2026 ahead of World Cup

Featured Article

Amazon is fumbling in India

Amazon’s decision to overlook quick-commerce in India is now looking like a significant misstep.

Amazon is fumbling in India

OpenAI’s GPT-4o, the generative AI model that powers the recently launched alpha of Advanced Voice Mode in ChatGPT, is the company’s first trained on voice as well as text and…

OpenAI finds that GPT-4o does some truly bizarre stuff sometimes

On Thursday, Box filled in a missing piece on its AI platform when it bought automated metadata extracting startup, Alphamoon.

Box adds crucial piece to its AI platform with Alphamoon acquisition

OpenAI has announced a new appointment to its board of directors: Zico Kolter. Kolter, a professor and director of the machine learning department at Carnegie Mellon, predominantly focuses his research…

OpenAI adds a Carnegie Mellon professor to its board of directors

Count Spotify and Epic Games among the Apple critics who are not happy with the iPhone maker’s newly revised compliance plan for the European Union’s Digital Markets Act (DMA). Shortly…

Spotify and Epic Games call Apple’s revised DMA compliance plan ‘confusing,’ ‘illegal’ and ‘unacceptable’

Thursday seeks to shake up conventional online dating in a crowded market. The app, which recently expanded to San Francisco, fosters intentional dating by restricting user access to Thursdays. At…

Thursday, the dating app that you can use only on Thursdays, expands to San Francisco

AI companies are gobbling up investor money and securing sky-high valuations early in their life cycle. This dynamic has many calling the AI industry a bubble. Nick Frosst, a co-founder…

Cohere co-founder Nick Frosst thinks everyone needs to be more realistic about what AI can and cannot do

Instagram is rolling out the ability for users to add up to 20 photos or videos to their feed carousels, as the platform embraces the trend of “photo dumps.” Back…

Instagram is embracing the ‘photo dump’

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Anyone paying…

Lyft ‘opens a can of whoop ass’ on surge pricing, Tesla’s Dojo explained and Saudi Arabia pumps $1.5B into Lucid

Flint Capital just closed its third fund at $160 million. Its has a unique strategy for finding its limited partner investors. 

Flint Capital raises a $160M through an unusual fund-raising strategy

Earlier this week it emerged that the DPC had instigated court proceedings seeking an injunction against X over the data processing without consent.

Elon Musk’s X agrees to pause EU data processing for training Grok

During testing, Google DeepMind’s table tennis bot was able to beat all of the beginner-level players it faced.

Google DeepMind develops a ‘solidly amateur’ table tennis robot

The X account announced that its Premium+ subscription would now be “fully” ad-free, leading some to question how this change would affect creator earnings.

As X sues advertisers over boycott, the app ditches all ads from its top subscription tier

Apple has further revised its compliance plan for the European Union’s Digital Markets Act (DMA) rulebook, which, since March, has forced it to give iOS developers more freedom over how…

Apple revises DMA compliance for App Store link-outs, applying fewer restrictions and a new fee structure

The rise of neobanks has been fascinating to witness, as a number of companies in recent years have grown from merely challenging traditional banks to being massive players in and…

Chime and Dave execs are coming to TechCrunch Disrupt 2024

If you visited the Wikipedia website on mobile this week, you might have seen a pop-up indicating that dark mode is ready for prime time.

How to enable Wikipedia’s dark mode

The home security company says attackers accessed databases containing customer home addresses, email addresses, and phone numbers.

Home security giant ADT says it was hacked

The Looking Glass Pro has a 6-inch display and a foldable base. It shows spatial images like those created with the Apple Vision Pro and iPhone 15 Pro.

Looking Glass’ new lineup includes a $300 phone-sized holographic display

TikTok’s latest offering is capitalizing on the app’s ability to serve as a discovery engine for other media — something its users already take advantage of by sharing short clips…

TikTok partners with Warner Bros. to become a discovery engine for TV and movies

Cocoon is a new startup built on the belief that greener steel production and the creation of concrete slag doesn’t have to be an either/or proposition.

Cocoon is transforming steel production runoff into a greener cement alternative

SoundHound, an AI company that makes voice interface tech used by car companies, restaurants and tech firms, is doubling down on enterprise services by playing consolidator in a crowded market.…

SoundHound acquires Amelia AI for $80M after it raised $189M+

Seeking mental health support is a complex process, but some founders believe that using AI to formalize techniques like cognitive behavioral therapy (CBT) can help folks who might not have…

Feeling Great’s new therapy app translates its psychiatrist co-founder’s experience into AI

The U.K.’s antitrust regulator has confirmed that it’s carrying out a formal antitrust investigation into Amazon’s ties with Anthropic, after Amazon recently completed a $4 billion investment into the AI startup.…

UK launches formal probe into Amazon’s ties with AI startup Anthropic