2025 AI Annual Summary of Chips: General GPU vs. Inference chip, from computing power competition to efficiency breakthrough

Media Reports

2026-01-09

At the end of 2025, NVIDIA announced its acquisition for $20 billion AI Emerging chip technology Groq， This record breaking acquisition case is like a deep winter earthquake, completely breaking the tranquility of the silicon wafer world. Previously, Groq By relying on traditional methods GPU The inference speed that is more than 10 times faster is known as the "strongest inference chip on the surface", and Nvidia's move is seen by the industry as a watershed moment in the paradigm shift of computing power.

It sends a clear signal to the world: AI The origin of chip warfare is rapidly shifting from "training" to "reasoning". When large models leave the laboratory and enter countless application scenarios, the cost and speed of inference will directly determine AI The life and death of the industry. In the macro perspective of 2025, this "transformation" not only marks the reconstruction of industry logic, but also indicates that reasoning and computing power has become the next high ground in the great power game and technological competition.

Looking back at 2025, the demand for computing power has shifted from a simple "parameter competition" to "application landing", which directly triggered domestic production AI Chip companies' collective breakthrough in the capital market. Even against the backdrop of the fierce technological competition between China and the United States, the strength of domestically produced chips still demonstrates astonishing resilience.

After Horizon and Black Sesame entered the Hong Kong stock market in 2024, 2025 ushered in a true "listing year": Moore Thread and Muxi Technology officially entered the market A On the Science and Technology Innovation Board, Tiantian Zhixin and Boren Technology have also entered the Hong Kong stock market. The continuous injection of capital and the large-scale delivery of several domestic computing power clusters at the level of tens of thousands of calories mark the transition of China's chip industry from a "filler" to a "core force". Standing at the end of the year, observing, AI The chip industry has clearly evolved from its initial chaotic state to a "universal" one GPU There are two major camps: the "Faction" and the "Reasoning Faction".

The first faction: the "general-purpose GPU faction" that pursues ecological moats

This faction remains the "foundation" of the global computing power system, committed to building large-scale computing resource pools in the cloud, and is the "cradle" for the birth of all massive parameter models. In the international market, Nvidia relies on Blackwell Architecture (B200) and its deep integration NVLink High speed internet technology has built almost monopolistic hardware performance barriers; But its true 'trump card' lies in its more than 20 years of operation CUDA Ecology, this deep integration of software and hardware, has made millions of developers around the world accustomed to Nvidia's underlying logic, resulting in extremely high migration costs.

As a challenger, AMD Through open source ROCm ecology and MI300 The cost-effectiveness advantage of the series is difficult to find cracks in high-performance computing and customized needs of specific cloud vendors, attempting to break the "one super" pattern.

Looking back at the domestic market, companies such as Moore Thread, Boren Technology, and Muxi Technology, as staunch practitioners of this route, have completed a key leap from "performance score" to "10000 card cluster testing" by 2025. They are not only committed to achieving seamless compatibility with mainstream software ecosystems on the underlying architecture, but also in terms of distributed computing efficiency and full functionality GPU Invest heavily in the universality of the architecture.

For domestic big model developers, the existence of this faction has extremely high strategic significance: they have preserved precious seeds for the iteration of domestic computing power through the advantages of high computing power and high usability, and solved the problem of "computing power availability" from 0 to 1 under heavy blockade. They are more like the "pioneering axes" of the era of computing power shortage, focusing on tackling parallel computing problems with extremely large parameter quantities in the cloud, and consolidating the most stable and solid digital foundation for the subsequent explosion of applications in the entire industry and scenario.

Technologically, Moore Thread has chosen the more aggressive "fully functional" option GPU” Direction, based on its independently developed MUSA Unified architecture, implemented simultaneously on one graphics card AI Training and reasoning, graphic rendering, video processing, and other multi scene abilities. based on MUSA Unified system, Moore Thread new generation full functionality GPU The architecture of 'Huagang' has achieved comprehensive breakthroughs in computing density, energy efficiency, accuracy support, interconnectivity, and graphic technology.

Biren Technology and Muxi Shares focus on General Motors GPU， Targeting the cloud computing market. Weiren Technology Positions as "High end Universal GPU”， by virtue of Chiplet Heterogeneous integration technology launches performance benchmark Nvidia H100 of BR100 Chips, becoming domestically produced GPU“ The representative enterprise of "Technology Ceiling".

Muxi Technology independently develops GPU IP As the core, it has broken through the concept of "high-performance computing"+ AI The dual scenario compatibility challenge in training MXMACA Software stack compatibility CUDA Ecology, can be directly migrated to Nvidia GPU The application on the platform has solved the industry pain point of "high cost of ecological migration" and is expected to quickly "digest" some of the domestic market left by Nvidia.

As the first domestic company to achieve universal training and reasoning GPU Mass production enterprise, Tiantian Zhixin adheres to a long-term philosophy, undergoes multiple generations of product iterations, and achieves universal use GPU The leap from "following" to "running in parallel" is universal GPU The product is fully compatible TensorFlow、PyTorch、PaddlePaddle Waiting for the mainstream at home and abroad AI The framework and various deep learning acceleration libraries reduce application migration time by more than 50% through standardized interfaces.

Currently, it is widely used domestically GPU The market is experiencing a dual dividend of "demand explosion+domestic substitution". The market size is expected to reach 154.6 billion yuan in 2024, and is projected to increase to 715.3 billion yuan in 2029. The market share of domestic manufacturers is expected to exceed 50%. With the continuous enhancement of multidimensional competitiveness, domestic universal GPU Enterprises are expected to further expand their market share in the trillion yuan substitution market and promote domestic universal use GPU The transformation of industries from followers to leaders.

The second faction: the "reasoning faction" that deeply cultivates the efficiency gap

As large models enter the stage of large-scale commercialization, the industry's focus is rapidly shifting from "how to train" to "how to land", giving rise to the emergence of the "reasoning faction".

Unlike the obsession with "brute force computing" on the training side, the inference side emphasizes processing performance, deterministic latency, and ultimate energy efficiency under organizational costs. On this track, Google's TPU、 Amazon's Inferentia They are all seeking the optimal efficiency solution through self-developed frameworks. Behind this market shift is AI The inevitable result of the transformation of the industry from "laboratory input" to "commercial output".

The reason why inference chips are highly anticipated is that their core logic directly determines AI The Token Economy of Applications. With the explosion of demand for long word processing, real-time voice dialogue, and multimodal generation, computing power consumption is no longer a one-time R&D investment, but a continuous operating cost that accompanies every user interaction. universal GPU Although powerful, there is often a waste of computing resources and high power consumption costs when performing a single inference task. In contrast, chips specifically designed for inference load optimization can provide several times better cost-effectiveness than general architectures by simplifying logic control, optimizing video memory bandwidth and computing power ratio. This means that only by reducing the cost of reasoning to an affordable range for both businesses and individuals, AI Only universal access has a practical foundation.

In this energy efficiency competition, Chinese domestic chip forces have demonstrated keen market insight and extremely strong scene penetration. Leading domestic manufacturers such as Huawei Ascend series, Cambricon, and Yuntian Lifei are accelerating their breakthrough in this differentiated track through structural innovation. Unlike the traditional approach of simply pursuing universal performance indicators, these enterprises tend to seek the optimal balance point in the golden triangle of "computing power, power consumption, and cost", thus building deep technical barriers on the "long board" of inference performance. This deep adaptation based on local large-scale application scenarios is accelerating the shift of domestic chips from simple hardware delivery to full stack energy efficiency services, reserving highly resilient growth space for the trillion level inference market in the future.

Based on the design of general integrated circuits with Nvidia GPU Different from others, Huawei Ascend series chips belong to the dedicated integrated circuit architecture NPU， Specially designed for handling AI Design of neural network computing tasks. Since 2019, Huawei has released multiple Ascend 910 series chips, including 910 B、910C Multiple products, this series is based on Huawei's self-developed da Vinci architecture, which compensates for the insufficient performance of single crystal chips through clustering and scaling, and is specifically designed for the cloud AI Training and reasoning use.

As a 'domestic product' AI The core advantage of Cambrian, known as the first chip stock, lies in its "full stack technology layout" and "large-scale landing capability". It is currently the only domestically produced company to achieve "cloud edge end integration" GPU The enterprise adopts a "software hardware collaboration+training and promotion integration" architecture in technology, independently develops intelligent processor instruction sets and microarchitectures, has high core technology barriers, and has become a "phenomenal enterprise" in the domestic chip field.

Yuntian Lifei Focus AI Reasoning track, committed to creating a 'Chinese version' TPU”。 Its independently developed GPNPU Based on the "computing power building block" architecture, it balances universality and high efficiency, and can achieve flexible expansion of computing power units under domestic technology to meet diverse scene requirements. The company's chips such as "Shenjie", "Shenqiong", and "Shenqing" are currently being applied in intelligent computing centers, embodied intelligence, and other fields, creating benchmark level solutions for customers in various industries AI The application provides strong domestic support.

in AI In the journey of technology from the laboratory to large-scale implementation, the inference process is becoming the core competition that determines experience and cost - chips specifically optimized for inference have become a new trend in the technology industry.

In the training era, Nvidia is undoubtedly the king and standard setter. To catch up on the training track, we have to face limitations in advanced processes and CUDA The reality of the ecological high wall is that the gap objectively exists, but the inference track will present a different scene. In the era of reasoning, "everyone is standing on the same new starting line. Whoever can establish advantages in cost, efficiency, and system capabilities has the opportunity CEO Chen Ning said.

Looking ahead to the future: shifting from 'violent computing' to 'refined operation'

Cost is pervasive AI The most realistic mountain in the face of scale. Looking ahead to 2026, AI The chip industry will no longer rely solely on absolute performance, but will enter a new era of specialization and refinement. The most significant trend is the thorough separation of training and inference: the "resource mismatch" mode of using expensive training chips to undertake simple inference tasks in the past will be terminated, and chips specifically designed for inference optimization will become the mainstream choice in the market.

Meanwhile, PD Separation (Prefill and Decode The large-scale implementation of cutting-edge architectures such as separation will undergo "refinement surgery" to address the load characteristics at different stages of the large model generation process. This technological evolution not only increases the throughput limit of computing power, but also greatly reduces it AI The marginal cost of application.

In this era of great competition, computing power is not only a competition of technology, but also a symbol of sovereignty. As the prelude to large-scale inference begins, the leading force of domestically produced inference chips, represented by Yuntian Lifei, is providing China with independent and controllable underlying technologies AI The industry has built a stable, affordable, and far-reaching 'new foundation'. This is not only a matter of winning or losing between enterprises, but also a key pivot for China to grasp digital sovereignty and drive qualitative changes in various industries in the era of intelligent computing.

The race of inference chips is competing against this kind of empowerment AI The ability to have a sense of reality. Here, for the first time, Chinese companies have stood on a similar starting line to their global competitors. The ultimate outcome of this competition may not be the birth of a single giant to replace Nvidia, but the growth of a new force that can deeply cultivate in "grain producing areas" such as government, finance, and industry, and provide stable, reliable, and cost-effective computing services.

Prev:2025 AI What are the main players in the chip market?Next:What is the future of NVIDIA AI Chip plan?

2026-01-09

2025 AI Annual Summary of Chips: General GPU vs. Inference chip, from computing power competition to efficiency breakthrough

View details >

2026-01-09

2025 AI Annual Summary of Chips: General GPU vs. Inference chip, from computing power competition to efficiency breakthrough

View details >

2025-12-01

2025 AI Annual Summary of Chips: General GPU vs. Inference chip, from computing power competition to efficiency breakthrough

View details >