Experiment 04 · The Creative

From Optimization to Strategy

从优化到策略

The first hexagram of the King Wen sequence — six unbroken lines, pure creative force. After three articles of negative results, the research returns to the beginning. Not with answers, but with a more honest question.

By Augustin Chan with AI · 2026-03-24

A Map of Failures

Over the course of this research program, the King Wen sequence was tested across five distinct experimental configurations.

As a learning rate schedule: it destabilized gradient descent. As a curriculum ordering: it performed worse than random shuffling. Across different hardware platforms: its effects vanished entirely on one of them. Through adaptive selection: it proved no better than letting a simple algorithm choose. In a simplified wargame: it reduced survival rather than improving it.

Five experiments, five negative results. The honest conclusion is specific: the King Wen sequence's properties — high variance and zero autocorrelation — are detrimental to continuous optimization. Gradient descent thrives on smooth, predictable signals. The sequence provides the opposite.

This is a clear boundary. Whether anything lies on the other side of it remains an open question.

The Honest Case — and Its Problems

The theoretical argument for testing the King Wen sequence in strategic decision-making goes like this: in game theory, unpredictability is a virtue. A strategy that an opponent cannot model is a strategy that cannot be exploited. The King Wen sequence is unpredictable. Therefore, it should help in games.

This argument has real problems that must be stated plainly.

First, random is also unpredictable — and maximally so. The curriculum experiments already showed that random shuffling outperforms King Wen at decorrelation. If the benefit comes from unpredictability alone, random will beat King Wen in games just as it beat King Wen in curricula.

Second, game theory's answer to 'what is the optimal unpredictable strategy' is already known. It is the Nash equilibrium mixed strategy — which, in many games, is simply uniform random. King Wen imposes structure on randomness. Structure means pattern. Pattern means exploitability. By this logic, King Wen is strictly worse than random for avoiding exploitation.

Third, any opponent using Bayesian inference will eventually learn the King Wen distribution and exploit its non-uniformity. The sequence has 63 fixed transition values. Given enough games, these can be mapped and countered.

These are not hypothetical objections. They are the direct predictions of the same experimental methodology that produced the five negative results. Intellectual honesty requires stating them before running the next experiment.

Where a Prior Might Matter

The honest case for King Wen is narrower than 'unpredictability helps in games.' It is this: in the early stages of learning, before an algorithm has gathered enough experience to compute its own strategy, the initial bias matters.

Consider a new player in an unfamiliar game. They must act before they understand. A uniform random strategy wastes early moves on actions that are obviously bad — attacking a much stronger neighbor, allying with a state that has betrayed you twice. A structured prior that encodes even crude intuitions about when to be aggressive and when to be cautious could produce better outcomes during the learning phase.

The trigram mapping attempts exactly this. Earth trigrams (坤) bias toward defensive actions. Heaven trigrams (乾) bias toward aggressive actions. Water trigrams (坎) bias toward adaptive, cautious play. These are not arbitrary assignments — they draw on three millennia of interpretive tradition about what these symbols mean in the context of human decision-making.

The question is not whether King Wen produces the optimal strategy. It does not — no fixed prior can. The question is whether King Wen produces a better warm start than random initialization, such that a learning algorithm converges faster or explores more productively from a King Wen starting point.

This is a weaker claim than the original hypothesis. It is also more testable and more honest.

惻隱之心,仁之端也。

孟子・公孫丑上

The heart of compassion is the sprout of benevolence.

Article 2 in this series framed the seed sensitivity experiment as a test of Mengzi's 'four sprouts' — innate dispositions that require cultivation to develop. The warm-start hypothesis brings this full circle. King Wen is not the answer to strategic decision-making. It is a sprout — an initial disposition that training develops, refines, or discards. The question is whether this particular sprout, shaped by three millennia of human interpretation, produces a better starting point than a random one.

The Game Ahead

The next phase of the research builds a seven-state Warring States simulation faithful to the historical topology: Qin, Han, Wei, Zhao, Qi, Chu, and Yan. Unlike the three-state triangle that killed Han in five rounds, the seven-state game recreates the diplomatic landscape where Han actually survived for 223 years — buffer-state value, shifting alliances, distant partners, and the combinatorial diplomacy that Su Qin and Zhang Yi wielded as weapons.

Han will again serve as the experimental subject. But the framing changes. The question is no longer 'Does King Wen make Han win?' It is: 'Does King Wen give a learning algorithm a better starting point for discovering Han's survival strategy?'

The controls remain rigorous. Scrambled King Wen sequences test whether the specific ordering matters or any fixed structure helps. Random priors test whether any non-uniform bias helps. Pure algorithmic agents provide a ceiling — how well can brute computation do without any human-interpretable structure?

The classical texts serve a dual role. They provide the source material for the game — the Zhanguoce's diplomatic episodes become scenarios, Han Fei's arguments become strategies. And they provide the evaluation framework: does a King Wen-guided Han behave in ways that the historical record recognizes as strategically coherent, even if it does not always win?

This last question may be the most interesting one. Victory is a clean metric but a narrow one. The historical Han did not win — it was the first state to fall. Yet Han Fei is read 2,200 years later and Zhang Cui's diplomatic theater is still studied. Survival through wisdom, even temporary survival, has a value that a win-rate percentage cannot capture.

天行,健。君子以自強不息。

易經・乾・象傳

Heaven's movement is ceaseless. The noble one matches this through continuous self-strengthening.

The first article in this series used this passage to explain why the King Wen sequence failed in gradient descent — optimization requires steady, continuous effort. Here the passage takes on a different meaning. Self-strengthening is not the absence of setbacks. It is the willingness to state honestly what failed, why it failed, and what remains worth trying — then to continue.

Notes

[1]referenceThe five experiments: LR modulation (ADR-002), static curriculum ordering (ADR-003/006), seed sensitivity (ADR-004), adaptive curriculum (ADR-007/A), and three-state game simulation (ADR-007, this series article 3). Total: 50+ individual runs across two platforms.

[2]technicalThe Nash equilibrium mixed strategy is the set of action probabilities where no player can improve their expected outcome by changing their own strategy. In symmetric games, this is often uniform random. In asymmetric games, the equilibrium can involve non-uniform mixing — which is where a structured prior like King Wen could theoretically provide a better starting point than uniform random.

[3]technicalExploitability (NashConv) measures how far a strategy is from Nash equilibrium. A strategy that follows King Wen's non-uniform distribution has positive exploitability by definition — an opponent who knows the distribution can construct a best response that beats it. The question is whether King Wen's exploitability is lower than other structured priors, not whether it is zero.

[4]technicalIn reinforcement learning, the choice of initial policy significantly affects exploration efficiency. A uniform random initial policy explores all actions equally, including obviously bad ones. A prior-biased initial policy can skip low-value regions of the action space. The tradeoff: a wrong prior can also bias exploration away from the true optimum. The scrambled King Wen and random prior controls test whether King Wen's specific structure helps or whether any non-uniform prior would do equally well.

[5]contextThe seven-state game design draws from warringstates-day ADR-002, which specifies the historical adjacency map, asymmetric state archetypes (Legalism for Qin, Daoism for Chu, Eclecticism for Qi), and five action types (attack, fortify, ally, reform, betray). The implementation uses DeepMind's OpenSpiel framework.

[6]contextHan was the first major state conquered by Qin (230 BC) and the state most dependent on strategic intelligence over raw power. If the King Wen sequence provides useful warm-start intuitions anywhere, it would be for the state that needs wisdom most and strength least. If it fails here too, the hypothesis is concluded across both continuous and discrete domains.

Five negative results point the way forward. The seven-state game is built, the agents have their personas, and the oracle speaks. Subscribe to follow the next chapter — and to receive daily passages from the classical texts that inform this research.

Subscribe to receive daily passages from the classical texts that inform this research.