Explaining DeepSeek: the Chinese Model’s Efficiency Is Scaring Markets

China’s DeepSeek model challenges US AI firms with cost-effective, efficient performance.
DeepSeek’s model is 20-40 times cheaper than OpenAI’s, using modest hardware.
DeepSeek’s efficiency raises questions about US investments in AI infrastructure.

The bombshell that is China’s DeepSeek model has set the AI ecosystem alight.

The models are high-performing, relatively cheap, and compute-efficient, which has led many to posit that they pose an existential threat to American companies like OpenAI and Meta — and the trillions of dollars going into building, improving, and scaling US AI infrastructure.

The price of DeepSeek’s open-source model is competitive — 20 to 40 times cheaper to run than comparable models from OpenAI, according to Bernstein analysts.

But the potentially more nerve-racking element in the DeepSeek equation for US-built models is the relatively modest hardware stack used to build them.

The DeepSeek-V3 model, which is most comparable to OpenAI’s ChatGPT, was trained on a cluster of 2,048 Nvidia H800 GPUs, according to the technical report published by the company.

H800s are the first version of the company’s defeatured chip for the Chinese market. After the regulations were amended, the company made another defeatured chip, the H20 to comply with the changes.

Though this may not always be the case, the chip is the most substantial cost in the large language model training equation. Being forced to use less-powerful, cheaper chips, creates a constraint that the DeepSeek team has ostensibly overcome.

“Innovation under constraints takes genius,” said Sri Ambati, CEO of open-source AI platform H2O.ai told Business Insider.

Even on subpar hardware, training DeepSeek-V3 took less than two months, according to the report.

The efficiency advantage

DeepSeek-V3 is small relative to its capabilities and has 671 billion parameters, while ChatGpt-4 has 1.76 trillion, which makes it easier to run. But it still hits impressive benchmarks of unerstanding.

Its smaller size comes in part from different architecture to ChatGPT called a “mixture of experts.” The model has pockets of expertise built-in, which go into action when called upon and sit dormant when irrelevant to the query. This type of model is growing in popularity and DeepSeek’s advantage is that it built an extremely efficient version of an inherently efficient architecture.

“Someone made this analogy: It’s almost as if someone released a $20 iPhone,” said Foundry CEO Jared Quincy Davis told BI.

The Chinese model used a fraction of the time, a fraction of the number of chips, and a less-capable, less expensive chip cluster. Essentially, it’s a drastically cheaper, competitively capable model that the firm is virtually giving away for free.

The model that is even more concerning from a competitive perspective, according to Bernstein is DeepSeek-R1, which is a reasoning model and more comparable to OpenAI’s o1 or o3. This model uses reasoning techniques to interrogate its own responses and thinking. The result is competitive with OpenAI’s latest reasoning models.

R1 was built on top of V3 and the research paper released alongside the more advanced model doesn’t include information about the hardware stack behind it. But, DeepSeek used strategies like generating its own training data to train R1, which requires more compute than using data scraped for the internet or generated by humans.

This technique is often referred to as “distillation” and is becoming a standard practice, Ambati said.

Distillation brings with it another layer of controversy, though. A company using its own models to distill a smarter, smaller model is one thing. But the legality of using other company’s models to distill new ones depends on licensing.

Still, DeepSeek’s techniques are more iterative and likely to be taken up by the AI undsutry immediately.

For years, model developers and startups have focused on smaller models since their size makes them cheaper to build and operate. The thinking was that small models would serve specific tasks. But what DeepSeek and potentially OpenAI’s o3 mini demonstrate is that small models can also be generalists.

It’s not game over

A coalition of players including Oracle and OpenAI, with cooperation from the White House, announced Stargate, a $500 billion data center project in Texas — the latest in a long, quick procession of a large-scale conversion to accelerated computing. The shock from DeepSeek has called that investment into question, and the largest beneficiary Nvidia, is on a roller coaster as a result. The company’s stock plummeted more than 13% Monday.

But Bernstein said the response is out of step with the reality.

“DeepSeek DID NOT ‘build OpenAI for $5M’,” writes Bernstein analysts in a Monday investor note. The panic, especially on “X” is blown out of proportion, the analysts wrote.

DeepSeek’s own research paper on V3 explains: “the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.” So the $5 million figure is only mart of the equation.

“The models look fantastic but we don’t think they are miracles,” Bernstein continued. Last week China also announced a roughly $140 billion investment in data centers, in a sign that infrastructure is still needed despite DeepSeek’s achievements.

The competition for model supremacy is fierce, and OpenAI’s moat may indeed be in question. But demand for chips shows no signs of slowing, according to Bernstein. Tech leaders are circling back to a centuries-old economic adage to explain the moment.

Jevon’s paradox is the idea that innovation begets demand. As technology gets cheaper or more efficient, demand increases much faster than prices drop. That’s what providers of computing power like Davis, have been espousing for years. This week, Bernstein and Microsoft CEO Satya Nadella picked up the mantle too.

“Jevon’s paradox strikes again!” Nadella posted on X Monday morning. “As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can’t get enough of,” he continued.

What's Hot

View the most important people and power structures at companies like Netflix, Google, and JPMorgan

Betterware de México, S.A.P.I. de C.V. appoints Raúl del Villar as CFO

Inside AI’s Surge With Fireworks AI CEO Lin Qiao

Explaining DeepSeek: the Chinese Model’s Efficiency Is Scaring Markets

View the most important people and power structures at companies like Netflix, Google, and JPMorgan

Inside AI’s Surge With Fireworks AI CEO Lin Qiao

Why It Costs $159,197 Annually to Raise Kids in NYC

View the most important people and power structures at companies like Netflix, Google, and JPMorgan

Betterware de México, S.A.P.I. de C.V. appoints Raúl del Villar as CFO

Inside AI’s Surge With Fireworks AI CEO Lin Qiao

Stocks to watch on Monday after market: SOC, OWL, MS

The Business of Formula One

Weddings and divorce: the scourge of investment returns

How F1 found a secret fuel to accelerate media rights growth

Archives

Categories

What's Hot

Explaining DeepSeek: the Chinese Model’s Efficiency Is Scaring Markets

The efficiency advantage

Related stories

It’s not game over

Related Posts

Subscribe to Updates