-->

Huawei Ascend 910C Breakthrough: Powering DeepSeek V4 Pro Post-Training

The landscape of artificial intelligence in China is witnessing a monumental shift as domestic hardware begins to handle the most demanding aspects of model development. While Huawei AI chips have already proven their mettle in AI inference, a recent breakthrough has demonstrated their capability in the far more complex arena of post-training. A research team successfully utilized Ascend processors to refine the DeepSeek V4 Pro model, marking a significant milestone in the quest for technological self-reliance.

Article Summary

  • ✨ Huawei's Ascend 910C chips successfully completed the full-parameter post-training of the DeepSeek V4 Pro model.
  • ✨ The project involved a massive computing cluster of approximately 1,000 chips working in tandem.
  • ✨ This achievement signals a transition from simple AI inference to complex AI model training on domestic Chinese hardware.
  • ✨ The training remained stable through 1,500 upgrades, proving the reliability of the Ascend ecosystem.
Huawei Ascend 910C chips powering DeepSeek V4 Pro training

Bridging the Gap: From Inference to Full-Scale Training

For years, the primary challenge for Chinese chipmakers has not been inference—the process of running a completed model—but the intensive "training" phase. Training requires massive computational synchronization and communication between thousands of chips. According to recent reports, a collaborative effort between Huawei and several prestigious institutions, including the Shenzhen Institute of Big Data and the Harbin Institute of Technology, has finally bridged this gap.

On June 5, researchers deployed a cluster of roughly 1,000 Ascend 910C processors to execute the post-training of the DeepSeek V4 Pro. Unlike partial fine-tuning, this was a full-parameter training exercise, meaning the entire model was upgraded without structural compromises. This transition is vital because it reduces dependency on restricted foreign hardware, such as Nvidia's H800 series.

Understanding Post-Training and Model Refinement

To appreciate this achievement, one must distinguish between the different stages of AI models. While pre-training teaches a model the basics of language by absorbing data, post-training is what makes the AI useful and safe for humans. It involves teaching the model to follow specific instructions, adhere to safety guidelines, and handle complex logic.

A recent social media post aptly described the difference: domestic computing power used to be a "one-way road" where you input a question and get an answer. The new project has transformed this into a complex network of "flyovers and loops," allowing the model to self-reflect and adjust dynamically. This shift exponentially increases the demand for computational speed and inter-chip communication.

The Future of Domestic AI Infrastructure

Previously, models like DeepSeek V3 relied on thousands of Nvidia H800 chips. However, the successful training of DeepSeek V4 Pro on Huawei hardware proves that the Huawei ecosystem is now stable and effective. The process completed over 1,500 training upgrades flawlessly, resulting in significantly improved mathematical capabilities and a more refined user experience for the DeepSeek V4 Pro.

Huawei AI chips used for DeepSeek V4 training

(Image Credits: Huawei)

What is the difference between AI training and AI inference?

Inference is the process of using a finished AI model to answer queries, which is computationally "light." Training, specifically post-training, is the process of teaching the model how to behave, follow rules, and solve complex problems, which requires massive computational power and constant data loops.

Which Huawei chips were used for the DeepSeek V4 Pro training?

The research team utilized a computing cluster consisting of approximately 1,000 Huawei Ascend 910C chips. These chips are designed specifically for high-performance AI workloads and are seen as a domestic alternative to restricted high-end GPUs.

How stable was the training process on the Ascend hardware?

The training process was remarkably stable, completing over 1,500 iterative upgrades without any technical flaws or system failures. This level of stability is a critical indicator that the hardware and software stack is ready for large-scale industrial use.

Why is this a breakthrough for the Chinese AI industry?

It demonstrates that China can now perform full-parameter training of advanced Large Language Models (LLMs) using entirely domestic hardware. This reduces the industry's vulnerability to international trade restrictions and hardware sanctions.

🔎 This breakthrough in post-training with Huawei’s Ascend 910C processors represents a defining moment for the global AI landscape. By successfully refining the DeepSeek V4 Pro model through a full-parameter training process, Huawei and its partners have proven that domestic hardware can meet the extreme demands of modern artificial intelligence. As the ecosystem continues to mature, we can expect a rapid acceleration in the development of independent, high-performance AI models that are no longer tethered to a single source of global hardware.