Live News US7

Read the News

Subscribe

Live News US7

Read the News

Subscribe

Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

China’s cheap, open AI model DeepSeek thrills scientists

Chinese firm DeepSeek debuted a version of its large language model last year.Credit: Koshiro K/Alamy

A Chinese-built large language model called DeepSeek-R1 is thrilling scientists as an affordable and open rival to ‘reasoning’ models such as OpenAI’s o1.

These models generate responses step-by-step, in a process analogous to human reasoning. This makes them more adept than earlier language models at solving scientific problems and could make them useful in research. Initial tests of R1, released on 20 January, show that its performance on certain tasks in chemistry, mathematics and coding is on par with that of o1 — which wowed researchers when it was released by OpenAI in September.

“This is wild and totally unexpected,” Elvis Saravia, an AI researcher and co-founder of the UK-based AI consulting firm DAIR.AI, wrote on X.

R1 stands out for another reason. DeepSeek, the start-up in Hangzhou that built the model, has released it as ‘open-weight’, meaning that researchers can study and build on the algorithm. Published under an MIT licence, the model can be freely reused but is not considered fully open source, because its training data has not been made available.

“The openness of DeepSeek is quite remarkable,” says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. By comparison, o1 and other models built by OpenAI in San Francisco, California, including its latest effort o3 are “essentially black boxes”, he says.

DeepSeek hasn’t released the full cost of training R1, but it is charging users around one-thirtieth of what o1 costs to run. The firm has also created mini ‘distilled’ versions of R1 to allow researchers with limited computing power to play with the model. An “experiment that cost more than £300 with o1, cost less than $10 with R1,” says Krenn. “This is a dramatic difference which will certainly play a role its future adoption.”

Challenge models

R1 is the part of a boom in Chinese large language models (LLMs). Spun out of a hedge fund, DeepSeek emerged from relative obscurity last month when it released a chatbot called V3, which outperformed major rivals, despite being built on a shoestring budget. Experts estimate that it cost around $6 million to rent the hardware needed to train the model, compared with upwards of $60 million for Meta’s Llama 3.1 405B, which used 11 times the computing resources.

Part of the buzz around DeepSeek is that it has succeeded in making R1 despite US export controls that limit Chinese firms’ access to the best computer chips designed for AI processing. “The fact that it comes out of China shows that being efficient with your resources matters more than compute scale alone,” says François Chollet, an AI researcher in Seattle, Washington.

DeepSeek’s progress suggests that “the perceived lead [the] US once had has narrowed significantly,” wrote Alvin Wang Graylin, a technology expert in Bellevue, Washington, who works at the Taiwan-based immersive technology firm HTC, on X. “The two countries need to pursue a collaborative approach to building advanced AI vs continuing on the current no-win arms race approach.”

Chain of thought

LLMs train on billions of samples of text, snipping them into word-parts called ‘tokens’ and learning patterns in the data. These associations allow the model to predict subsequent tokens in a sentence. But LLMs are prone to inventing facts, a phenomenon called ‘hallucination’, and often struggle to reason through problems.

Source link

Latest

Free 7 Day Healthy Meal Plan (March 17-23)

This post may contain affiliate links. Read my disclosure...

Chappell Roan’s “The Giver” Is Our Song of the Week

Every week, Consequence’s Songs of the Week column spotlights...

Why Your Focus Groups Are Failing and What to Do Instead

The Duct Tape Marketing Podcast with Jacqueline Lieberman In this...

Newsletter

spot_img

Don't miss

Free 7 Day Healthy Meal Plan (March 17-23)

This post may contain affiliate links. Read my disclosure...

Chappell Roan’s “The Giver” Is Our Song of the Week

Every week, Consequence’s Songs of the Week column spotlights...

Why Your Focus Groups Are Failing and What to Do Instead

The Duct Tape Marketing Podcast with Jacqueline Lieberman In this...

Toxic town impact as most high-risk contaminated sites unchecked

Tomos MorganBBC Wales InvestigatesPaul LynchBBC Shared Data UnitGetty Images"There...
spot_imgspot_img

Free 7 Day Healthy Meal Plan (March 17-23)

This post may contain affiliate links. Read my disclosure policy. A free 7-day, flexible weight loss meal plan including breakfast, lunch and dinner ideas and a shopping list. All recipes include macros and Weight...

Chappell Roan’s “The Giver” Is Our Song of the Week

Every week, Consequence’s Songs of the Week column spotlights the best new tracks from the previous seven days and takes a look at notable...

MWIC Bonus Episode 2: Autocar meets Bentley CEO Frank Walliser

Join our WhatsApp community and be the first to read about the latest news and reviews...