China’s cheap, open AI model DeepSeek thrills scientists

A Chinese-built large language model called DeepSeek-R1 is thrilling scientists as an affordable and open rival to ‘reasoning’ models such as OpenAI’s o1.

These models generate responses step-by-step, in a process analogous to human reasoning. This makes them more adept than earlier language models at solving scientific problems and could make them useful in research. Initial tests of R1, released on 20 January, show that its performance on certain tasks in chemistry, mathematics and coding is on par with that of o1 — which wowed researchers when it was released by OpenAI in September.

“This is wild and totally unexpected,” Elvis Saravia, an AI researcher and co-founder of the UK-based AI consulting firm DAIR.AI, wrote on X.

R1 stands out for another reason. DeepSeek, the start-up in Hangzhou that built the model, has released it as ‘open-weight’, meaning that researchers can study and build on the algorithm. Published under an MIT licence, the model can be freely reused but is not considered fully open source, because its training data has not been made available.

“The openness of DeepSeek is quite remarkable,” says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. By comparison, o1 and other models built by OpenAI in San Francisco, California, including its latest effort o3 are “essentially black boxes”, he says.

AI hallucinations can’t be stopped — but these techniques can limit their damage

DeepSeek hasn’t released the full cost of training R1, but it is charging users around one-thirtieth of what o1 costs to run. The firm has also created mini ‘distilled’ versions of R1 to allow researchers with limited computing power to play with the model. An “experiment that cost more than £300 with o1, cost less than $10 with R1,” says Krenn. “This is a dramatic difference which will certainly play a role its future adoption.”

Challenge models

R1 is the part of a boom in Chinese large language models (LLMs). Spun out of a hedge fund, DeepSeek emerged from relative obscurity last month when it released a chatbot called V3, which outperformed major rivals, despite being built on a shoestring budget. Experts estimate that it cost around $6 million to rent the hardware needed to train the model, compared with upwards of $60 million for Meta’s Llama 3.1 405B, which used 11 times the computing resources.

Part of the buzz around DeepSeek is that it has succeeded in making R1 despite US export controls that limit Chinese firms’ access to the best computer chips designed for AI processing. “The fact that it comes out of China shows that being efficient with your resources matters more than compute scale alone,” says François Chollet, an AI researcher in Seattle, Washington.

DeepSeek’s progress suggests that “the perceived lead [the] US once had has narrowed significantly,” wrote Alvin Wang Graylin, a technology expert in Bellevue, Washington, who works at the Taiwan-based immersive technology firm HTC, on X. “The two countries need to pursue a collaborative approach to building advanced AI vs continuing on the current no-win arms race approach.”

Chain of thought

LLMs train on billions of samples of text, snipping them into word-parts called ‘tokens’ and learning patterns in the data. These associations allow the model to predict subsequent tokens in a sentence. But LLMs are prone to inventing facts, a phenomenon called ‘hallucination’, and often struggle to reason through problems.

Source link

Live News US7

Read the News

Subscribe

Live News US7

Read the News

Subscribe

China’s cheap, open AI model DeepSeek thrills scientists

Challenge models

Chain of thought

Latest

How to Follow the Trajectory of Comet 3I/Atlas

Why Is Trump Threatening to Intervene In Nigeria?

Flaky Chocolate Pie Crust | Love and Olive Oil

Hayley Williams “Leaks” 2026 Solo Tour Dates

Newsletter

Don't miss

How to Follow the Trajectory of Comet 3I/Atlas

Why Is Trump Threatening to Intervene In Nigeria?

Flaky Chocolate Pie Crust | Love and Olive Oil

Hayley Williams “Leaks” 2026 Solo Tour Dates

1,800-Mile 2005 Ferrari 575 Superamerica GTC for sale on BaT Auctions – ending November 19 (Lot #219,836)

How to Follow the Trajectory of Comet 3I/Atlas

Why Is Trump Threatening to Intervene In Nigeria?

Flaky Chocolate Pie Crust | Love and Olive Oil

About us

Most recent

11 New Albums You Should Listen to Now: Snocaps, Florence and the Machine, and More

The best first cars for new drivers 2025 – driven, rated and ranked

Michael Saylor’s Strategy returns to profitability in third quarter

Collecting all the croissants in Zürich – FlowingData

Contact us

Subscribe