The elimination of domestic large models is accelerating.
This round of elimination will last for one to two years, and only a few enterprises with truly strong foundational models will be able to survive.
The price war for large models in the Chinese market has been going on for nearly half a year.
This round of price war has reached negative gross margins, and there is no sign of stopping for the time being.
Leading cloud vendors are still brewing a new round of price reductions, which will be implemented in late September this year.
In May this year, Chinese cloud vendors started a price war on the inference computing power of large models.
ByteDance's cloud service Volcano Engine, Alibaba Cloud, Baidu Intelligent Cloud, and Tencent Cloud successively reduced the price of large model inference computing power by more than 90%.
Using large models requires inputting prompt language, and after reasoning, the content output is obtained.
This process will call the API (Application Programming Interface, like water and electricity switches), and pay according to the number of consumed Tokens (Token is the text unit of the large model, a Token can be a word, punctuation, number, symbol, etc.).
Advertisement
This is like paying for water and electricity according to the amount of use.
After the price reduction, the consumption of inference computing power is indeed growing rapidly.
In August this year, Baidu disclosed in the second quarter financial report telephone meeting that the number of daily API calls for Baidu Wenxin large model was 200 million in May, and increased to 600 million times in August; the daily Token consumption was 250 billion in May, and increased to 1 trillion in August.
ByteDance announced in August this year that as of July, the daily Token usage of ByteDance Doubao large model exceeded 50 billion.
Compared with May, the average daily Token usage of each enterprise increased by 22 times.
The Token price has decreased by more than 90%.
This will reduce the inference revenue of cloud vendors in the short term.
However, cloud vendors hope to reduce the trial and error threshold of enterprise customers through this method, forming an exponential computing power consumption of more than 10 times, and ultimately achieving long-term revenue growth.
The price war for inference computing power in the domestic large model market has lasted for half a year, and there are currently three basic facts: First, the price war for inference computing power has reached negative gross margins.
Recently, several cloud vendors including Alibaba Cloud and Baidu Intelligent Cloud have revealed to us that before May this year, the gross margin of domestic large model inference computing power was higher than 60%, which is basically consistent with international counterparts.
After the major manufacturers reduced prices in May this year, the gross margin of inference computing power fell to negative.
Second, compared with models of the same specification as OpenAI, the price of domestic models is generally only 20%-50% of it.
The gross margin of domestic large models is far lower than that of OpenAI.
The international market research institution FutureSearch's research report in August this year stated that the gross margin of OpenAI's flagship model GPT-4 series is about 75%, and the gross margin of the main model GPT-4o series is about 55%.
OpenAI's comprehensive gross margin is at least more than 40%.
Third, insufficient model capability is an important cause of the price war.
A core person in charge of the large model business of a cloud vendor believes that the capabilities of domestic flagship models are generally different from OpenAI's GPT-4 series flagship models, so they need to encourage customers to try and make mistakes through price reduction.

As the model price continues to fall, price is no longer the most concerned factor for enterprise customers.
The capability and effect of the model are the most concerned by enterprise customers.
We have consulted the large model inference prices published on the official websites of Alibaba Cloud, Volcano Engine, Baidu Intelligent Cloud, Tencent Cloud, and OpenAI.
Compared with models of the same specification as OpenAI, the price of domestic models is generally only 20%-50%.
Taking Alibaba's Tongyi Qianwen-Max, Baidu's ERNIE-4.0-8K, and Tencent's Hunyuan-Pro as examples of the three flagship models, the output price of the three models per million Tokens is 120 yuan, 120 yuan, and 100 yuan respectively.
Their benchmarking OpenAI flagship model GPT-4-turbo's output price per million Tokens is 210 yuan (the official price of OpenAI is 30 US dollars, which has been converted to RMB at an exchange rate of 1:7).
The price of these three domestic large models is only about 50% of GPT-4-turbo.
Taking Alibaba's Qwen-Long, Baidu's ERNIE-Speed-Pro-128K, and Tencent's Hunyuan-Embedding as examples of the three entry-level models, the output price of the three models per million Tokens is 2 yuan, 0.8 yuan, and 5 yuan respectively.
OpenAI's low-cost model OpenAI gpt-4o-mini's output price per million Tokens is 4.2 yuan (the official price of OpenAI is 0.6 US dollars, which has been converted to RMB at an exchange rate of 1:7).
Alibaba and Baidu's entry-level models are only 48% and 19% of the price of OpenAI's entry-level model.
The price war for large models has reached negative gross margins, but this has not stopped the pace of price reduction of various cloud vendors.
The news we got is that Alibaba Cloud and other leading cloud vendors are still brewing a new round of price reduction.
This round of price reduction will be implemented in late September this year.
High-performance flagship models are the focus of this round of price reduction.
The core person in charge of the large model business of the aforementioned cloud vendors believes that there is not much room for price reduction of low-cost small-sized models at present, and the last round of price reduction has reached the "psychological bottom line" of enterprise customers.
The next focus is whether the flagship models of each family will continue to reduce prices.
The flagship models will also be further subdivided, dividing into cost-effective versions that can solve most problems, and high-quality, high-priced versions that can solve extremely difficult problems.
Why continue to reduce prices when the inference computing power of large models has reached negative gross margins?
Large cloud factories look at the long-term market trend - the computing power structure of cloud computing is undergoing drastic changes.
Seizing more inference computing power is to seize more incremental markets.
The international market research institution IDC predicts that from 2022 to 2027, the annual compound growth rate of general computing power in China will be 16.6%, and the annual compound growth rate of intelligent computing power will be 33.9%.
From 2022 to 2027, within the intelligent computing power, the proportion of inference computing power will rise to 72.6%, and the proportion of training computing power will decline to 27.4%.
Cloud vendors are willing to give up short-term revenue for the expected long-term growth.
In the short term, the revenue that inference computing power can bring is not much.
A technical person from a Chinese cloud vendor explained that the revenue of model calls for each family in 2024 will not exceed 1 billion yuan, which is limited in the scale of hundreds of billions of revenue every year.
Cloud vendors are willing to accept short-term revenue loss and business loss in the next 1-2 years.
Everyone is gambling that the number of calls for large models in the next 1-2 years will have at least an exponential growth of more than 10 times.
In the end, long-term revenue growth can make up for short-term revenue loss.
He further explained that in this process, the cost of computing power will gradually thin out with the increase of customer demand.
The large model business still has the opportunity to achieve positive profit.
Even if the bet is not established, there will be a group of model vendors who die in the price war, and the surviving vendors will clean up the mess.
Different cloud vendors also have different competitive considerations in the face of the price war - Volcano Engine, Alibaba Cloud, Baidu Intelligent Cloud are all participating in a price war that must be fought.
Volcano Engine is currently not in the top five in the Chinese public cloud market share, but the revenue growth rate of Volcano Engine in 2023 exceeded 150%.
Large models are an important opportunity for it to catch up in the cloud market.
Tan Dai, the president of Volcano Engine, mentioned to us in May this year that in March this year, he found in Silicon Valley that the AI application entrepreneurship in the United States showed the trend of the early stage of China's mobile Internet from 2012 to 2014.
"AI application entrepreneurship small teams, quickly achieve revenue and financing.
The Chinese market may show this trend in the future.
But the premise is that the inference price should be reduced, and the threshold for trial and error should be reduced."
Alibaba Cloud is in the first place in the Chinese public cloud market.
Faced with the price reduction of competitors, Alibaba Cloud must follow.
Liu Weiguang, the general manager of Alibaba Cloud's public cloud business division, analyzed to us in June this year that Alibaba Cloud has gone through multiple rounds of deduction and calculation internally, and found two contradictions: Baidu Intelligent Cloud takes AI as the core strategy.
A technical person in charge of Baidu's large model business said to us in July this year that the large model is a battle that must be fought, and the price war must be fought even if it is difficult.
This strategy has achieved actual results.
Baidu Intelligent Cloud's revenue growth rate in the second quarter of 2024 has risen to 14%, which is the highest point in the past two years.
Baidu's management disclosed in the second quarter financial report telephone meeting in 2024 that the proportion of Baidu Intelligent Cloud's large model revenue has increased from 4.8% in the fourth quarter of 2023 to 9% in the second quarter of 2024.
An AI strategy planning person from a leading Chinese technology enterprise analyzed that Volcano Engine is backed by ByteDance, and the mother company's advertising business can provide blood.
Volcano Engine is not in the top five in the cloud market share, and hopes to seize more market share through the price war.
Alibaba Cloud mainly comes from the four major pieces of public cloud (computing, storage, network, database), and low-priced models will promote the consumption of customer business data, and thus drive the sales of the aforementioned basic cloud products.
Large models are Baidu's core strategy, and Baidu is the earliest to deploy large model business in China.
When other competitors decide on the price war, Baidu must follow.
Price is not the decisive factor.
On the other hand, the negative gross margin of the large model inference price war is that low price is not the main factor for enterprise customers to use large models.
The aforementioned core person in charge of the large model business of the cloud vendor believes that cloud vendors cannot rely on long-term money burning and loss to promote the industrial landing of large models.
The significance of low-performance, low-price models is not great.
Insufficient model capabilities are the important reasons for the negative gross margin price war.
With the sharp decline in the price of domestic model calls, price is no longer the most concerned factor for enterprise customers.
The capabilities and effects of the model are the most concerned by enterprise customers.
An IT person in charge of an insurance company agrees with this.
He said that the IT expenditure of the financial insurance industry currently accounts for about 3%-5% of the company's revenue.
After deducting 80% of the hardware IT expenditure, the IT expenditure actually used for digital transformation is only 20%.
The use of new technologies such as large models must be clear about the input-output ratio.
In addition to the explicit model costs, implicit costs should also be considered - large models need to be compatible with existing IT systems, and preparing business data for large models requires data governance, and a group of AI product managers need to be recruited.
What he is most concerned about is the capabilities and actual effects of the model.Stanford University's Center for Research on Foundation Models (CRFM) has been conducting long-term global large model testing and ranking.
As of September 17th, the large-scale multitask language understanding (MMLU) test rankings showed that the top ten model manufacturers included Anthropic's Claude 3.5 series (invested by Amazon), Meta's Llama3.1 series, OpenAI's GPT-4 series (invested by Microsoft), and Google's Gemini 1.5 series.
Currently, only Alibaba's Tongyi Qianwen 2 Instruct (72B) has entered the top ten among Chinese large models.
Many Chinese cloud vendors' large model technologists expressed the same view to "Caijing": In the large model market, a strategy of low performance and low price is not sustainable.
The ideal scenario is to establish a healthy and sustainable business cycle based on high performance and reasonable pricing.
A benchmark with reference value is OpenAI.
As of this year in September, OpenAI has 1 billion monthly active users and 11 million paying users (including 10 million individual subscribers and 1 million corporate subscribers).
In May of this year, OpenAI's management announced that the company's annualized revenue (annualized revenue is monthly revenue × 12, subscription software companies receive monthly subscription renewals from users, with a stable income expectation, hence the annualized revenue is often used) reached $3.4 billion (converted at a rate of 1:7, approximately 24.1 billion yuan).
The latest research report from the international market research institution FutureSearch, based on OpenAI's published annualized revenue and paying user structure, calculated the company's revenue structure - 10 million individual subscribers brought in $1.9 billion in revenue, accounting for 56%; 1 million corporate subscribers brought in $710 million in revenue, accounting for 21%; API calls brought in $510 million in revenue, accounting for 15%.
Even after several rounds of price reductions, OpenAI can still maintain a relatively healthy gross margin.
In April of this year, the output price of OpenAI's flagship model GPT-4-turbo was reduced by 67%.
In August, the output price of OpenAI's main model GPT-4o was reduced by 30%.
The research report released by FutureSearch in August stated that the gross margin of OpenAI's flagship model GPT-4 series is about 75%, and the gross margin of the main model GPT-4o series is about 55%.
OpenAI's comprehensive gross margin is at least above 40%.
OpenAI's growth environment is unique.
It has ample computational power supply, a large To C (consumer customer) user base, and is located in the world's largest To B (business customer) software market.
OpenAI's success over the past two years has been to rely on large computational power to "force miracles."
Chinese companies lack such computational power conditions and financing environment as OpenAI.
Computational power is a key weakness for Chinese model manufacturers.
A model technologist from a Chinese cloud vendor explained that over the past year, Chinese cloud vendors have paid more than 1.5 times the procurement cost for NVIDIA's AI chips, which keeps the model computational power cost high.
This will affect the performance ceiling of large models and also hinder the industrial landing of large models.
A server dealer introduced that in 2023, the eight-card server equipped with NVIDIA H100/H800 series AI chips in the Chinese market once exceeded 3 million yuan per unit, which is more than 1.5 times the official pricing of NVIDIA.
How can Chinese companies find a development path suitable for themselves under the conditions of limited computational resources and high computational costs?
This requires careful planning and tailoring.
Over the past two years, the development of large models has followed the Scaling Law (a law proposed by OpenAI in 2020, literally translated as the "scaling law") - the performance of models is mainly related to the size of computational volume, model parameter volume, and training data volume.
The core person in charge of the large model business of the aforementioned cloud vendor mentioned that the core principle is to improve data quality and quantity under the constraints of the Scaling Law, appropriately reduce model parameters, and also adopt the MoE (Mixture of Experts, a model design strategy that mixes multiple specialized models to achieve better performance) architecture to improve model performance and reduce inference costs.
In terms of specific business strategies, there are two plans.
OpenAI's three models this year, GPT-4, GPT-4Turbo, and GPT-4o, have evolved along this line of thought.
The model parameters of GPT-4o are smaller than those of GPT-4, but it can accurately solve most daily problems.
GPT-4 Turbo is used to solve more difficult problems.
OpenAI's latest o1-preview has the strongest performance, it has been strengthened by reinforcement learning, and is no longer a single model, it will think repeatedly before outputting an answer, thereby enhancing the model's capabilities.
The output price of these three models for one million Tokens is 70 yuan, 210 yuan, and 420 yuan (the official price on OpenAI's website is $10, $30, and $60, respectively, converted at a rate of 1:7).
The elimination competition is accelerating the price war with negative gross margins, which is accelerating the elimination competition in the large model market.
Many industry insiders expressed the same view to "Caijing" that this round of elimination competition will last for one to two years, and only three to five basic model companies can continue to survive.
An Xiaopeng, a member of the executive committee of the China Informationization 100 and director of the Alibaba Cloud Intelligent Technology Research Center, said in July this year that large models need continuous investment, and they need the ability to have ten thousand cards or even a hundred thousand cards, and they also need commercial returns.
Many companies do not have such capabilities.
In the future, there will only be three to five basic model manufacturers in the Chinese market.
The development of large models requires the purchase of chips and servers, and the leasing of land to build data centers.
This part of the investment is even as high as tens of billions of yuan per year.
These costs will be reflected in the capital expenditure of technology companies.
Microsoft disclosed in the fourth quarter financial report call for the fiscal year 2024 that the capital expenditure of $19 billion for the month was almost entirely used for computational power investment.
In the past year (from the third quarter of 2023 to the second quarter of 2024), the capital expenditures of Alibaba, Tencent, and Baidu were as high as 23.2 billion yuan, 23.1 billion yuan, and 11.3 billion yuan, respectively, with increases of 77.1%, 154.1%, and 46.9%, respectively, all driven by computational power investment.
In addition to the continuous computational power investment of tens of billions of yuan, the large model inference business also requires hundreds of millions of yuan in subsidies per year.
A senior executive of a Chinese cloud vendor analyzed that the negative gross margin of large model calls means that the more calls in the short term, the greater the loss.
According to the current inference computational power usage, several leading cloud vendors participating in the price war will have to subsidize more than one billion yuan for large model inference computational power consumption in 2024.
Alibaba Cloud, Volcano Engine, Baidu Intelligent Cloud, and Tencent Cloud can rely on the group's blood transfusion to fight the price war with large models, but it is difficult for large model startups to persist.
The aforementioned AI strategic planners of Chinese leading technology companies believe that in this round of price war, Alibaba Cloud and Volcano Engine have the thickest blood.
Alibaba can rely on cloud profits, and Volcano Engine has the blood transfusion of ByteDance's advertising business.
In terms of price war, Baidu is not as good as Alibaba and ByteDance.
However, Baidu's Wenxin large model technology is strong and there will be a group of customers willing to pay for technology.
This helps Baidu withstand the price war.
Large model startups need to rely on large factories and financing to survive in the short term.
A technical person from a large model startup said in September this year that ZhiPu AI, Baichuan Intelligence, Moon's Dark Side, Zero One Everything, and Minimax, the "Five Little Tigers" of domestic large models, are all invested by Alibaba.
One way of investment is that the investment amount is paid in the form of computational power, and the invested company uses Alibaba Cloud's computational power.
Whether the "Five Little Tigers" can continue to survive depends to some extent on whether Alibaba wants to continue to invest.
The aforementioned technical personnel of the leading cloud vendors and the aforementioned technical personnel of the large model startups both believe that large model startups in the Chinese market will face tests in the next two years.
It is difficult for them to break through in the basic model market, and there may be three ways out in the future - either choose to become a model development company for government and enterprise projects, or turn to To B vertical industry models, or turn to To C application markets.
In fact, market differentiation has already begun.
ZhiPu AI is winning a large number of government and enterprise projects, while Moon's Dark Side is only focused on the To C market.