DeepSeek Looks Fantastic But Not A Miracle And Not Built In $5m, Panic On It Seems Overblown: Bernstein Report

DeepSeek Looks Fantastic But Not A Miracle And Not Built In $5m, Panic On It Seems Overblown: Bernstein Report

New Delhi: Amidst the growing excitement on social media and stock markets regarding the emerging AI company DeepSeek, a report from Bernstein has clarified that while DeepSeek's offerings are impressive, they do not represent a groundbreaking achievement and were not developed for USD 5 million.

The report specifically addressed the claims surrounding DeepSeek's models, particularly the notion that the company created something akin to OpenAI for a mere USD 5 million. It emphasized that such assertions are misleading and fail to capture the complete context.

The report stated, "we believe that DeepSeek DID NOT 'build OpenAI for USD 5M'; the models are impressive, but we do not consider them miraculous; the ensuing panic on Twitter over the weekend appears exaggerated."

According to Bernstein, DeepSeek has introduced two primary families of AI models: 'DeepSeek-V3' and 'DeepSeek R1'. The V3 model is a large language model that employs a Mixture-of-Experts (MOE) architecture.

This method integrates several smaller models to collaborate, achieving high performance while utilizing significantly fewer computing resources than other large models. The V3 model boasts a total of 671 billion parameters, with 37 billion active at any moment.

Additionally, it features advanced techniques such as Multi-Head Latent Attention (MHLA), which minimizes memory consumption, and mixed-precision training using FP8 computation, enhancing efficiency.

To train the V3 model, DeepSeek utilized a cluster of 2,048 NVIDIA H800 GPUs over a period of approximately two months, amounting to around 2.7 million GPU hours for pre-training and 2.8 million GPU hours when including post-training.

While some estimates suggest that the training cost could be around USD 5 million based on a USD 2 per GPU hour rental rate, the report highlights that this figure does not encompass the extensive research, experimentation, and other expenses associated with the model's development.

The second model, 'DeepSeek R1', enhances the V3 framework by incorporating Reinforcement Learning (RL) and various other methodologies to markedly boost its reasoning abilities. The R1 model has demonstrated notable performance, competing effectively with OpenAI's models in reasoning challenges.

Nonetheless, the report indicated that the resources invested in the development of R1 were likely considerable, although the exact figures were not specified in the company's research documentation.

Despite the surrounding excitement, the report highlighted that DeepSeek's models are indeed noteworthy. For example, the V3 model matches or surpasses the performance of other large models in language, coding, and mathematics assessments while utilizing only a small portion of the computational resources.

In summary, the report concluded that while DeepSeek's accomplishments are impressive, the concerns and inflated assertions regarding the creation of an OpenAI competitor for USD 5 million are exaggerated.

 

Related Stories

See All