Large Models Are Just Another "Converting Rice Paddies to Mulberry Fields" (Gemini 2.5 Pro Translated Version)

Many years ago, when driven by a lethal dose of curiosity and exploratory desire, I stepped into the “Cult of Artificial Idiocy” (a sarcastic term for AI), I most likely didn’t realize that even on my deathbed someday in the future, I would still want to, just like today in November 2024, dissuade every Ph.D. student who still harbors illusions about large models.

The reason “Converting Rice Paddies to Mulberry Fields” became what it is doesn’t lie in whether it refers to Jiajing’s mulberry, Guangxu’s Westernization Movement, the Great Leap Forward’s steel, “Academician Chen’s” chips, “Academician Pan’s” quantum projects, the “grand foundation” of conductors, or internet companies’ large models. Some things, some of our countrymen just can’t escape the inherent foolish logic: the top level indulges in delusional fantasies, the upper level engages in mutual deception and intrigue, the middle level obsequiously caters to superiors, and the bottom level carries heavy burdens forward.

Looking back more than 50 million seconds ago (roughly 1.5 years), there was the story of a $50 million venture aiming to be China’s OpenAI. Although I admit I was very actively and enthusiastically sarcastic about it at the time, I can say with a clear conscience that his (the founder’s) depression definitely wasn’t caught from me. At that time, I simply wrote this passage: “I don’t doubt their determination to become OpenAI in the slightest. It’s just that they want to be the OpenAI that’s already successful and raking in money, not the OpenAI that toiled in obscurity, honing its craft for ten years. Recall when Japan protested China’s establishment of an Air Defense Identification Zone; our Ministry of Foreign Affairs retorted: Japan can first abolish its own ADIZ, and China will then consider abolishing ours fifty years later (because Japan’s had been in effect for 50 years at the time).” And even then, I felt this venture wasn’t very smart, because the last Chinese AI incident remembered for a number seems to have been the 40-million-valued “Green Dam Young Mistress” (a controversial filtering software).

If you fall into such thinking: “Does the top level really not understand this matter?” I believe that’s certainly not the case. Jiajing might not have known how long mulberry saplings take to grow, and the magnates on the Forbes list probably don’t understand that Megatron has 4 types of parallelism methods, but the information and intelligence they can access far exceed that of us “training serfs” who’ve just boosted performance by another 0.3-0.5 points. It’s just that for them, this is simultaneously a presentable PowerPoint slide, a high-quality, autonomous project aligned with policy, and a combo move to boost financial consumer enthusiasm through (15 characters deleted here, implying some sensitive or dubious means). And what if—I admit there’s an element of gambling here—just what if, this year they can weave an extra 200,000 bolts of silk, thereby instantly laying out a mattress for lying down and making money for the next ten years?

Then this matter reaches the hands of the first batch of implementers, say, the cabinet or VPs. The matter then begins to take physical form, but this form usually doesn’t revolve around the top level’s delusion. After all, when people reach this position, even if they don’t understand how good the J-20 fighter’s (Mighty Dragon) performance is, seeing the air force equip itself with a sky full of Boeing 767s (a possibly anachronistic or satirical example) due to actual combat records, they would immediately dismiss the fantasy. So for them, this entity will only revolve around two central ideas: first, whether the job can be done well is not important; who fails to do it well is very important. Second, whether the job can be done well is not important; that everyone who should benefit does so while it’s in my hands is very important. After all, even if this “what if” is achieved, I won’t be the one lying down making money. But as long as no one can achieve this “what if,” I can still be the one kneeling here making money. Although these two thoughts intertwined to become the rope that strangled China’s version of OpenAI, subjectively, I can’t bring myself to hate this group of people. Half of their power comes from the top, and half from the system design. Unless they are willing to give up this power, their existence, much like the [SEP] token, maintains the chat framework; without them, the model would inevitably start generating incomprehensible responses.

Since we’re talking about power, let’s climb down this rope to the middle class, whose power almost entirely (100%) derives from the upper level. So, for (most of) them, 100% fulfilling the upper level’s wishes is their only way to survive, even if it means actually flooding nine counties (a historical allusion to the “Converting Rice Paddies to Mulberry Fields” policy) or processing millions of data points across tens of major categories and thousands of subcategories to overfit a few test sets. “Scaling up data is Boss X’s intention; it benefits the department above and you below. Why are you so reluctant to do it?”, “Regarding being responsible for our own profits and losses, we will use large models and new business ventures; the dilemma will resolve itself.”, “Where in this world is there a large model that can always follow instructions? If these bad cases aren’t reported upwards selectively, and everything is reported, then everyone gets a 325 (a low performance rating).”, “Processing 10,000 data points is a number, processing a million data points is also a number. You and I both have to do this job.”, “To score 1 point higher in this evaluation than last time, we must process another 1 million data points targeting weak areas. Let everyone suffer a bit; I’ll take the blame for the 325s.” Such examples are too numerous to count.

I’ve lamented before that there’s a large group of “chosen ones” in China who benefited from the open-source boom and are now wreaking havoc in the large model field. But I actually want to make a few more outrageous statements: 95% of the previous generation of AI scholars in China were already “research-dead” by 2020, the year scaling began to dominate change. This caused the mindset of a large number of highly-cited scholars, whose main technique was selecting inductive biases combined with various little tricks to boost scores, to fall seriously behind industry developments. And when these highly-cited scholars “guided” scaling in major model teams, they created a large amount of “industrial waste” (産業ゴミ - a Japanese term) that was like “trying to draw a tiger but ending up with a dog.” Then they shamelessly touted this industrial waste in the “top three Chinese journals,” claiming “significant outperformance of GPT-4o in Chinese scenarios.” So, you mean to say your data wasn’t generated using GPT-4o? Oh, right, it wasn’t. It was generated using Claude.

Alas, the bottom level suffers. All year round, they can only count on Llama to open-source some new things.

Enjoy Reading This Article?