Labor-Intensive Large Models Have No Future (Gemini 2.5 Pro Translated Version)

Ever since Louis XIV ascended the throne 380 years ago, people have constantly come to ask me: what is the biggest difference between the “Turbo” type and the “Omni” type models?

Actually, for this question, I have prepared three tiers of answer templates: If the other party is a highly-cited scholar, I will pretend to answer cautiously, “The Turbo type has only one modality, but the Omni type has multiple modalities.” This way, they will narrow their eyes, already squeezed into slits by their chubby faces, and with a foolish grin, condescendingly instruct you, “Yes, there are four modalities, do you know what they all are?” – This is what’s known as dismounting to fight an opponent who is mounted (a tactical mismatch to expose arrogance). If the other party is a newly joined intern, I will robotically recite some technical details discussed in their papers and tell them that in another six months, we will have an open-source Omni type for them to experiment with – this is like a middle-tier horse stably outperforming a lower-tier one. But if I’m facing you, my most sagacious and diligent readers, I will try to impart this most meaningful answer:

Turbo type is a technology-intensive product, while Omni type is a labor-intensive product.

The confidence with which I can say this comes from GPTs, which perhaps are on the verge of dying out: although there are some issues not inherent to GPTs themselves (e.g., Bing-based search functionality being crippled by various search ranking hacking websites), the core reason is still that GPT itself cannot follow the various complex instructions of the real world. This means that among the numerous GPTs, only an extremely small number, in very specific domains, performing repetitive standard tasks, can be executed well by it. In other words, our current technology has not yet evolved to provide a precise, general-purpose instruction interpreter (which is a core component of GPTs), to the extent that we need to build a system akin to QEC (Quantum Error Correction) to allow this interpreter to function fully. And the construction of this system itself is non-standardized and dependent on experience and time costs. So, after the initial hype faded, GPTs gradually disappeared from sight, fully signifying the bankruptcy of the “provide a technology-intensive base model and let users customize it” approach. At this point, OpenAI had no choice but to follow the industry’s footsteps and train an Omni type: if you use this model enough, you will naturally and fully experience the results optimized for “niche domains, fine-grained categories.” Even if it might not be as good as the Turbo type at understanding longer instructions, it is genuinely fast and good on commonly used (especially leaderboard-gaming) commands.

And why has the industry taken such steps? According to my conjectural model above, the sequence of events was probably like this: initially, someone said they were two months behind others. But words need proof, so let’s do some benchmarking. However, upon checking the academic leaderboards, they were shocked to find they had inadvertently crushed everyone else a year ago, yet the actual user experience was almost as bad as a widely mocked internet meme (like “Kun Kun”). So, the solution was this: let’s divide user scenarios into a dozen major categories and hundreds of subcategories. This way, as long as we win by a slight margin in any category, we get a point. Then, we merge the subcategories where we lose badly into “other scenarios.” Thus, we can finally experience the feeling of winning again on the “User Capability Experience Leaderboard.” Consequently, development became oriented towards categorization optimization. Each category was assigned to a few people, who then “picked” some “reference answers” from others’ models, integrated them into a self-developed training framework, and used a self-developed PPO algorithm (“Pippi O”) to tirelessly improve the scores for each small category. Gradually, the production process for new-quality labor-intensive large models was born.

Setting aside discussions of technology and ideals for now, let’s talk about this model itself. I realized that its biggest advantage is that it gives capitalists and their management a sense of security:

It effectively combines the politically correct concept of “real-world application” (落地 - luòdì) with their work direction. Because indeed, human creativity, on average, is very poor, including the diversity and complexity of questions asked. So, there’s nothing wrong with optimizing for the essence of a targeted repeater (i.e., common, specific queries). If we don’t do this, nobody knows what we’ll end up with, which would lead to a sharp decline in user experience, severely deviating from the objective business truth that you need real-world application to make money.
It effectively provides a basis for writing sponsored articles (“soft content”), and the investment in soft content can, at some future time of financial consolidation, be “converted” into goodwill. This forms a virtuous cycle of “the more investment in soft content -> the higher the valuation; the higher the valuation -> the more attention received; the more attention received -> the less investment in soft content is needed” (a cycle described perhaps ironically as “more soft content investment leads to less soft content investment”). Simultaneously, it also provides a stable monetization model for major evaluation agencies, creating a good ecosystem. After all, boss, you wouldn’t want your own large model to score second to last in our programming category, right?
It effectively creates identifiable intangible assets—data. From an accounting perspective, the hardware required for large models depreciates in a few years, frameworks are open-source turned self-developed, people leave or burn out after a while, and trained models become outdated in two months. Only data, high-quality data with a good classification system, is the most solid, never-degenerating asset on my books.

However, OpenAI has the confidence to create a labor-intensive model.

Firstly, they genuinely have a technology-intensive model. They can not only acquire vast amounts of data almost cost-free using the data from your calls to GPT-4 to process your data (and a large part of this is real user cases encountered online), but they can also conduct strictly logits-based distillation experiments. The Omni type is OpenAI’s Omni type, but it’s your labor-intensive type. And I think the name “Omni” is still debatable. If I were Sam Altman, I’d definitely call it the “Lightwing Type” Close Support Cruel GPT. Secondly, OpenAI has a large number of algorithm personnel who truly understand the technology. Many influential algorithms like GPT, PPO, CLIP, etc., all originated there. To use an analogy, I could say they have a “complete technology supply chain.” In contrast, your company’s “Senior/Super” titled experts/researchers/scientists are basically “big lovelies” (often used sarcastically for those out of touch) who feasted on the open-source dividends of CV/NLP for the past ten years. The “technology” they can supply is merely tricks to improve performance on some already beaten-to-death benchmark by 0.X%. Their entire thinking system is out of touch with the times; they can’t even read or derive some of the most basic formulas. In the end, it seems everyone is working very hard, but they are just constantly calling other people’s models to process data day after day, centered around some fixed leaderboard. Lastly, OpenAI does have a need for a lightweight model to free up compute power for new work. At this point, a model with relatively good performance in niche areas becomes particularly cost-effective: after all, if users find that the Omni type cannot understand the complex instructions they use for data processing, they will naturally turn to the Turbo type.

Of course, no one will admit they are making labor-intensive large models. However, your server is Supermicro, your switch is Mellanox, your CPU (humorously “KFC”) is Intel, your accelerator card is from Old Huang’s family (Nvidia), your operating system is open-source Linux (“Li Nakesi”), your programming language is open-source Python (“Mangshe” - Boa/Python), your coding tool is open-source VS Code, your computation framework is open-source PyTorch (“Huoju” - Torch), your training frameworks are open-source Megatron (“Weizhentian”), DeepSpeed (“Shenchen Suli”), TransformerEngine (“Bianyaqi Yinqing”), and FlashAttention (“Shanshuo Zhuyili”). Your pre-training data is mostly from Common Crawl. Your teacher model for distillation is the Omni type. Alright, now you tell me, where exactly is your “technology intensity”?

Finally, this article is purely a product of personal conjecture. Please do not take it personally or assume it refers to any specific entity. Additionally, a salute to the few domestic teams still pursuing technology and the open-source path.

Labor-Intensive Large Models Have No Future (Gemini 2.5 Pro Translated Version)

Further Reading

Enjoy Reading This Article?