Reflections and Private Musings After 18 Months in the Large Model Pit (Gemini 2.5 Pro Translated Version)

A few days ago, after a meeting attended by high-level management, a technical lead in our group directly started ranting, “There’s no planning whatsoever, all day long they just obsess over gaming a few leaderboards to fool the boss.” I subconsciously hurried to pull him back, whispering, “Your voice is too soft, the boss can’t hear. Later, I’ll take you to the building’s security room, and you can use the emergency announcement speaker to tell everyone in the building.” He seemed to realize something and asked me what kind of equipment and logistical supplies would be needed to storm the security room with a dozen or so people. And so, the content of the meeting was thus happily forgotten.

Returning to my workstation, I suddenly looked back and realized I’ve been in this “pit” (the LLM field) for a year and a half. Reflecting on it, I have indeed gained some things, such as understanding core implementations like Megatron-LM, DeepSpeed, TransformerEngine, FlashAttention, etc., and grasping the worldview of reinforcement learning and its fusion with the worldview of optimization. However, if I were to use one phrase to describe my current state of mind, it would undoubtedly be “jumping into pits won’t save the world.” Quitting the pit to become a writer won’t save it either. Our industry, from its fundamental logic, has been operated in such a way: the open-source community creates a batch of “chosen ones” (people favored by destiny); these chosen ones spontaneously coalesce into “makeshift troupes” (often amateurish or opportunistic teams); these makeshift troupes are then gradually eliminated by the open-source community. And in this cycle, these troupes that should have been eliminated instead try to establish “memorial archways” (symbolizing virtue or legitimacy, often ironically) through closed-sourcing and commercialization. This has formed the peculiar spectacle of our industry being riddled with pits.

If I were to give a clear definition to this “pit,” it could roughly be described as a “competitive system characterized by homogenization, formulization, low efficiency, and low innovation,” quite reminiscent of current fields like low-end chips, commercial urban districts, adult education, new energy vehicles, and supply chain finance. The only difference is that large models have an added layer of “window paper” (a thin veil of mystery)—their black-box nature. In other words, they are not designed but are a kind of exploration and discovery of nature, and they also have an extremely high financial barrier to entry. Therefore, to the general public, they create a sense of estrangement similar to that of the Large Hadron Collider exploring the fundamental principles of the universe. However, large language models themselves also possess a strong humanistic nature, similar to economics, which in turn gives these makeshift troupes a high degree of fault tolerance.

I realized long ago that the current path based on QKV attention + next token prediction + scaling is almost at its end. It’s not that scaling can no longer play a role, but that the returns from scaling are far from justifying the investment. Moreover, one of my controversial statements is that scaling will make large models even more like large models—”rich and mediocre.” This is why you feel that a response is generated by a large model. Conversely, rich and mediocre answers are not entirely meaningless, as they can at least be used as raw material for creation. However, for intelligence itself, this quality is meaningless, not to mention the makeshift troupes that, under the banner of creating intelligence, end up only producing a pile of rich and mediocre products.

Of course, even if the above path is dead, or if some more pessimistic views claim that connectionism is dead, large models can still continue to develop. Because, clearly, besides creative needs, in the process of improving productivity, we also have the need to “understand complex instructions and output precisely.” In my mind, this demand will promote the development of “connectionist symbolism” (neuro-symbolic approaches), similar to the collaboration between the formalizer network, Lean (theorem prover), and solver network in projects like AlphaProof. And this is the biggest opportunity for large model development in the next few years. Perhaps “Strawberry” and “Orion” (likely codenames for large models) are similar things, but unfortunately, these are closed-source. Makeshift troupes, in their bones, don’t really dare to touch these things, because deep down they know very well what made them successful (i.e., open-source and simpler approaches).

So, you see I’ve been criticizing these makeshift troupes for so long; let me also give them a qualitative definition: a low-spec version of an institutional circle/system. Its low-spec nature is manifested in the following three aspects: immature systems and more rule by man (rather than rule of law); greater influence of “divine power” (the open-source community); and relatively free movement between “kingdoms” (companies/teams) with a lack of constraints on people’s behavior. I cannot elaborate on the many problems here; firstly, it would attract too much hate, and secondly, it would “charge the tower” (internet slang for criticizing authority and risking censorship). The only advice I can give to the younger folks is: don’t live your life like Gao Hanwen (an idealistic but ultimately tragic character in the TV drama “Ming Dynasty 1566,” manipulated within a corrupt system), or try not to jump into this pit from the beginning, if your ideals are truly poetry, intelligence, and the distant horizon. On another note, I actually think some of the domestic open-source contributors are doing quite well, like DeepSeek (this is not a sponsored post, because they rejected my resume directly due to my age). The main reasons are: firstly, they are backed by the financial sector, have their own specific value proposition, and don’t need to worry about money; secondly, there are genuinely some good-natured, idealistic “foolish kids” in there striving for their dreams, creating a good technical atmosphere.

At this point, in a daze, I feel as if I’m a biological intelligence welded dead inside a “Type Zero” (零式 - likely referencing the Mitsubishi A6M Zero fighter, or a similar “final,” perhaps sacrificial, integrated entity from pop culture). Ultimately, I don’t know how I will be buried along with this large model industry. Hope some good things happen tomorrow.

Enjoy Reading This Article?