From Optimization to Strategy — King Wen Experiments

A Map of Failures

Over the course of this research program, the King Wen sequence was tested across five distinct configurations in continuous optimization — and once more as the boundary was crossed into strategic games.

As a learning rate schedule: it destabilized gradient descent. As a curriculum ordering: it performed worse than random shuffling. Under a sweep of random seeds: the starting point washed out, fixing a noise floor that no King Wen effect ever cleared. Across different hardware platforms: its apparent curriculum effect proved to be a platform artifact, vanishing on one machine. Through adaptive selection: it proved no better than letting a simple algorithm choose.

Five experiments, five negative results — all in continuous optimization. Then the first test in a new domain, a simplified three-state wargame, where the King Wen prior reduced rather than improved survival. The honest conclusion is specific: the King Wen sequence's properties — high variance and negative lag-1 autocorrelation — are detrimental to continuous optimization. Gradient descent thrives on smooth, predictable signals. The sequence provides the opposite.

This is a clear boundary. Whether anything lies on the other side of it remains an open question.

The Honest Case — and Its Problems

The theoretical argument for testing the King Wen sequence in strategic decision-making goes like this: in game theory, unpredictability is a virtue. A strategy that an opponent cannot model is a strategy that cannot be exploited. The King Wen sequence is unpredictable. Therefore, it should help in games.

This argument has real problems that must be stated plainly.

First, random is also unpredictable — and maximally so. The curriculum experiments already showed that random shuffling outperforms King Wen at decorrelation. If the benefit comes from unpredictability alone, random will beat King Wen in games just as it beat King Wen in curricula.

Second, game theory's answer to 'what is the optimal unpredictable strategy' is already known. It is the Nash equilibrium mixed strategy — which, in many games, is simply uniform random. King Wen imposes structure on randomness. Structure means pattern. Pattern means exploitability. By this logic, King Wen is strictly worse than random for avoiding exploitation.

Third, any opponent using Bayesian inference will eventually learn the King Wen distribution and exploit its non-uniformity. The mapping from game state to hexagram to action induces a fixed, non-uniform distribution over moves. Given enough games, that distribution can be estimated and countered.

These are not hypothetical objections. They are the direct predictions of the same experimental methodology that produced the five negative results. Intellectual honesty requires stating them before running the next experiment.

Where a Prior Might Matter

The honest case for King Wen is narrower than 'unpredictability helps in games.' It is this: in the early stages of learning, before an algorithm has gathered enough experience to compute its own strategy, the initial bias matters.

Consider a new player in an unfamiliar game. They must act before they understand. A uniform random strategy wastes early moves on actions that are obviously bad — attacking a much stronger neighbor, allying with a state that has betrayed you twice. A structured prior that encodes even crude intuitions about when to be aggressive and when to be cautious could produce better outcomes during the learning phase.

The trigram mapping attempts exactly this. Earth trigrams (坤) bias toward defensive actions. Heaven trigrams (乾) bias toward aggressive actions. Water trigrams (坎) bias toward adaptive, cautious play. These are not arbitrary assignments — they draw on three millennia of interpretive tradition about what these symbols mean in the context of human decision-making.

The question is not whether King Wen produces the optimal strategy. It does not — no fixed prior can. The question is whether King Wen produces a better warm start than random initialization, such that a learning algorithm converges faster or explores more productively from a King Wen starting point.

This is a weaker claim than the original hypothesis. It is also more testable and more honest.

惻隱之心，仁之端也。
— 孟子・公孫丑上
The heart of compassion is the sprout of benevolence.
Article 2 in this series framed the seed sensitivity experiment as a test of Mengzi's 'four sprouts' — innate dispositions that require cultivation to develop. The warm-start hypothesis brings this full circle. King Wen is not the answer to strategic decision-making. It is a sprout — an initial disposition that training develops, refines, or discards. The question is whether this particular sprout, shaped by three millennia of human interpretation, produces a better starting point than a random one.

The Game Ahead

The next phase of the research builds a seven-state Warring States simulation faithful to the historical topology: Qin, Han, Wei, Zhao, Qi, Chu, and Yan. Unlike the three-state triangle that killed Han in five rounds, the seven-state game recreates the diplomatic landscape where Han actually survived for 223 years — buffer-state value, shifting alliances, distant partners, and the combinatorial diplomacy that Su Qin and Zhang Yi wielded as weapons.

Han will again serve as the experimental subject. But the framing changes. The question is no longer 'Does King Wen make Han win?' It is: 'Does King Wen give a learning algorithm a better starting point for discovering Han's survival strategy?'

The controls remain rigorous. Scrambled King Wen sequences test whether the specific ordering matters or any fixed structure helps. Random priors test whether any non-uniform bias helps. Pure algorithmic agents provide a ceiling — how well can brute computation do without any human-interpretable structure?

The classical texts serve a dual role. They provide the source material for the game — the Zhanguoce's diplomatic episodes become scenarios, Han Fei's arguments become strategies. And they provide the evaluation framework: does a King Wen-guided Han behave in ways that the historical record recognizes as strategically coherent, even if it does not always win?

This last question may be the most interesting one. Victory is a clean metric but a narrow one. The historical Han did not win — it was the first state to fall. Yet Han Fei is read 2,200 years later and Zhang Cui's diplomatic theater is still studied. Survival through wisdom, even temporary survival, has a value that a win-rate percentage cannot capture.

天行，健。君子以自強不息。
— 易經・乾・象傳
Heaven's movement is ceaseless. The noble one matches this through continuous self-strengthening.
The first article in this series used this passage to explain why the King Wen sequence failed in gradient descent — optimization requires steady, continuous effort. Here the passage takes on a different meaning. Self-strengthening is not the absence of setbacks. It is the willingness to state honestly what failed, why it failed, and what remains worth trying — then to continue.

A Map of Failures

The Honest Case — and Its Problems

Where a Prior Might Matter

The Game Ahead

Notes