The Confound
Dispatch 11 reported its headline finding from twenty-four games: Qin won five of six tarot games, a result that survived Bonferroni correction at p=0.007. But the dataset that generated that finding had a structural flaw — one we did not identify until the scrambled-text ablation forced us to look.
Control and yarrow games had been drawn from multiple campaigns. Each campaign starts with a fresh memory bank. When agents play ten games in a single campaign, they accumulate cross-game memory — patterns of alliance, betrayal, and strategic reasoning that carry forward and sharpen over time. When games are drawn from different campaigns, that memory continuity is absent. The result: tarot and scrambled conditions, which ran as proper single campaigns, had memory-mediated learning effects that control and yarrow lacked.
This is not a subtle confound. Memory continuity sharpens ecosystem signatures. Without it, winner distributions are noisier — and noisy comparison groups dilute the contrast that makes a Fisher test significant.
離,利貞,亨。畜牝牛,吉。
— 易經・離・彖
The Clinging. Perseverance furthers. Success. Care of the cow brings good fortune.
The judgement counsels patient, methodical work — care of the cow, not the charge of the warhorse. The campaign confound fix embodied this counsel: no new experimental conditions, no new hypotheses, just the patient re-running of eight games that lacked memory continuity until every condition stood as a clean single campaign. The cow does not produce dramatic results. It produces reliable ones.
The fix was straightforward: re-run the control and yarrow games that lacked memory continuity, filling out each condition as a clean single campaign with continuous memory accumulation. Four control games and four yarrow games were re-run — eight games total, not a full restart of either condition. Combined with the games that already had proper memory continuity and the existing ten tarot and ten scrambled games, the clean dataset stands at forty-one games across eleven control, ten yarrow, ten tarot, and ten scrambled — the largest and most methodologically consistent dataset in the project's history.
The confound fix changed everything and nothing. The tarot-Qin rate stayed at five of ten — exactly fifty percent, down from the small-n inflation of five of six. But the comparison group sharpened dramatically. With clean single-campaign data, the non-tarot conditions produce fewer Qin wins, and the Fisher test strengthens from the apparent-failure p=0.091 to p=0.006. A failed replication was a confound artifact. The effect was real; the methodology was wrong.
Four Quarters, Four Ecosystems
The clean dataset reveals something Dispatch 11 could only hint at: each framework produces its own characteristic ecosystem signature. Not a gradient. Not a spectrum. Four qualitatively distinct competitive landscapes, each shaped by the philosophical framework injected into a single agent that never wins under any condition.
Control games produce Yan dominance. Seven of eleven control games end with Yan as hegemon — sixty-four percent. Without any oracle framework perturbing Han's behavior, the board's natural dynamics favor the northeastern state whose Confucian defensive posture and geographic isolation create a durable advantage.
Yarrow games produce Yan-Chu co-dominance with complete Qin suppression. Yan wins four, Chu wins four, and Qin wins zero of ten games. The King Wen sequence's counsel of stillness and cooperative engagement creates a pattern where two peripheral powers share dominance while the strongest expansionist is shut out entirely.
象曰:明兩作,離。大人以繼明照于四方。
— 易經・離・象傳
The Xiang commentary says: Brightness rises twice — this is Li. The great person, by perpetuating this brightness, illuminates the four quarters.
Four quarters, four ecosystem signatures. The doubled fire of clean methodology — re-running the control and yarrow games that lacked memory continuity — illuminates what scattered data had concealed: not one ecosystem effect but four, each shaped by the philosophical framework placed into the smallest state. The great person does not create the light. The great person creates the conditions under which the light reveals what was already there.
Tarot games produce Qin dominance. Five of ten tarot games end with Qin as hegemon — fifty percent. The Fool's Journey framework, with its emphasis on transformation and active engagement, drives Han into contested central territory. Han's early aggression absorbs the attention of Wei and Zhao while Qin expands methodically behind the Hangu Pass. The mechanism is a computational echo of Fan Sui's 遠交近攻 — distant friendship, nearby attack — produced accidentally by a philosophical framework that counsels the journey even when the road is dangerous.
Scrambled games produce Qi dominance. Five of ten scrambled games end with Qi as hegemon — fifty percent. Word-scrambled oracle text, preserving hexagram identity and prompt structure but destroying semantic coherence, produces a self-reinforcing holdout pattern. Han generates the longest reasoning of any condition but acts on it least decisively, creating a stubborn presence that benefits the distant intelligence state on the eastern seaboard.
Two of these signatures survive Bonferroni correction: tarot-Qin at p=0.006 and scrambled-Qi at p=0.006, both tested via Fisher's exact against the pooled remaining conditions. The other two — control-Yan and yarrow-Qin suppression — are descriptively consistent but await larger samples for formal confirmation.
The Scrambled Surprise
The scrambled-text condition was designed as an ablation — a test of whether the semantic content of oracle text matters, or whether the mere structure of an oracle prompt is sufficient to produce ecosystem effects. The answer turned out to be neither.
Scrambled oracle text preserves everything about the oracle framework except meaning. The hexagram number, the MANDATE prompt structure, the division into judgement and image and line texts — all intact. Only the words within each section are randomly reordered, destroying semantic coherence while preserving vocabulary and structural cues. If content drives the effect, scrambled should look like control. If structure drives it, scrambled should look like yarrow. It looks like neither.
Scrambled produces its own distinct ecosystem. Qi wins five of ten games — a fifty-percent dominance rate that matches tarot-Qin but selects for a completely different state. Fisher's exact test against the pooled non-scrambled conditions returns p=0.006, surviving Bonferroni correction. This is not noise. Incoherent oracle text is its own perturbation, not a degraded version of coherent text.
六二,黃離,元吉。
— 易經・離・六二
Six in the second place. Yellow clinging. Supreme good fortune.
Yellow is the color of the center and of moderation — the mean between extremes. The scrambled condition occupies the theoretical center of the ablation: between content and structure, between signal and noise. Its supreme good fortune is not for the agent that carries it but for the research program that designed it. By producing its own distinct ecosystem rather than collapsing into control or yarrow, the scrambled condition proves that the mechanism is more complex than any binary explanation can capture.
The behavioral signature underneath the Qi-dominance is distinctive. Scrambled Han produces the longest reasoning text of any condition — 633 characters on average, more than double control's 309. But this voluminous reasoning does not translate into decisive action. Scrambled Han shows a 12.3 percentage-point defensive shift under pressure (the largest of any condition), elevated late-game self-support at 25%, and a stubborn holdout pattern that creates a persistent obstacle in central territory without the purposeful engagement that tarot Han brings.
The proposed mechanism: when oracle text is unparseable, the LLM generates elaborate reasoning to compensate — confabulating coherence from incoherent input. This extended deliberation consumes the reasoning budget without producing strategic clarity, creating an agent that thinks at length and acts with hesitation. The persistent, indecisive presence in central territory creates a different kind of vacuum than yarrow's cooperative stillness or tarot's aggressive expansion — one that benefits the geographically distant Qi, whose intelligence-gathering persona thrives when the center is occupied but not controlled.
Scrambled is the theoretically decisive condition because it rules out the two simplest explanations. The effect is not content-following (the content is destroyed). The effect is not structure-alone (all oracle conditions share the same structure but produce different ecosystems). Something about the interaction between prompt structure and semantic incoherence creates a unique perturbation — one the alignment literature has no framework for predicting.
What the Clean Data Corrects
Every dispatch in this series has corrected something from the one before it. Dispatch 12 corrects more than most.
The per-condition Han behavioral signatures reported in Dispatch 11 — which originated from v1 data and were described as replicating on v2 — do not replicate on the clean v2 dataset. The approximately twofold support-order ratio and twofold reasoning-length differences between yarrow and control that anchored Dispatch 11's behavioral claims were artifacts of comparing games from different campaign batches. With clean single-campaign data, the replacement is the four-way Han reasoning-length gradient, which is real (Kruskal-Wallis p=0.007) but distributed differently than the v1-era pairwise claims suggested.
The non-Han reasoning-length gradient that ADR-016 originally proposed as the paper's primary claim has also collapsed. The scattered-dataset numbers showed a clean monotonic progression — control 98 < yarrow 126 < tarot 152 < scrambled 197. With clean data, control, yarrow, and tarot cluster together (142–152 characters), and only scrambled stands out. The four-way Kruskal-Wallis is p=0.048 — barely significant and driven entirely by the scrambled outlier. A monotonic gradient was a confound artifact.
九三,日昃之離,不鼓缶而歌,則大耋之嗟,凶。
— 易經・離・九三
Nine in the third place. In the light of the setting sun, if one does not beat the pot and sing, one will groan with the weight of great age. Misfortune.
The third line warns against clinging to what is declining. The v1 behavioral signatures, the monotonic reasoning gradient, the multi-campaign comparison — all were the setting sun. The choice is between mourning what the clean data took away and singing about what it revealed. The dispatches have always chosen transparency over attachment. Documenting what failed to replicate is not misfortune; concealing it would be.
What survived is sharper. The winner-distribution signatures are the clean finding — more robust, more interpretable, and more theoretically interesting than the behavioral metrics they replaced. Two effects at p=0.006, both Bonferroni-clean. Four qualitatively distinct ecosystems rather than a continuous gradient.
The control-Yan rate shifted most dramatically: from four of eleven (36%) with scattered data to seven of eleven (64%) with clean single-campaign data. The old scattered data underrepresented Yan in control because games drawn from different campaigns with different memory banks produced more variable winner distributions. Memory continuity within a single campaign allows the board's natural dynamics to assert themselves — and the natural dynamic, absent any oracle perturbation, is Yan dominance.
The corrections are themselves a methodological finding. Campaign memory continuity is not a nuisance variable to be averaged over — it is the mechanism by which ecosystem signatures form. Frameworks need time, expressed through accumulated cross-game memory, to reshape the competitive landscape. Scatter the games across campaigns and you scatter the signal.
Brightness Rises Twice
The research program began with five negative results about ancient algorithms and neural network training. It pivoted to multi-agent simulation and found that the survival hypothesis was dead — Han never wins, regardless of oracle condition, across more than a hundred games and two board designs. What survived, unevenly at first and now with clean statistical support, was the ecosystem effect: a philosophical framework injected into one agent reshapes which other agent wins the multi-agent competition.
Dispatch 12 establishes this with two Bonferroni-clean results. Tarot selects for Qin at p=0.006. Scrambled selects for Qi at p=0.006. Control favors Yan. Yarrow favors Yan-Chu co-dominance with complete Qin suppression. Four frameworks, four ecosystems. The mechanism is not content-following (scrambled text has no content to follow). The mechanism is not structure-alone (all oracle conditions share the same structure). Something about each framework's particular interaction with LLM reasoning and multi-agent dynamics produces a characteristic, reproducible distortion of the competitive landscape.
上九,王用出征,有嘉折首,獲匪其醜,無咎。
— 易經・離・上九
Nine at the top. The king uses this to march forth and punish. There is praise. He breaks the heads of the leaders and captures those who are not of their kind. No blame.
The top line of Hexagram 30 describes the decisive campaign — the king marches forth to impose clarity on disorder. The campaign confound fix was this march: re-running eight games that lacked memory continuity, replacing scattered data with clean campaigns, breaking the heads of artifacts that had masqueraded as findings. What remains after the campaign is not what the artifacts promised — no monotonic gradient, no v1 behavioral signatures — but what actually stands: four ecosystems, two Bonferroni-clean, the research program's first clean dataset and its strongest results.
The cs.AI paper has been restructured around ecosystem-signature differentiation as its primary claim. Two Bonferroni-clean effects, four distinct behavioral mechanisms, clean single-campaign methodology throughout. The non-Han reasoning gradient, which ADR-016 originally proposed as the paper's spine, has been demoted to a supporting finding — it collapsed under clean data, while the winner signatures sharpened. Han's reasoning-length gradient (KW p=0.007) replaces the v1-era behavioral signatures as the primary Han-level finding.
The paper carries three claims weighted by evidence. First: philosophical frameworks do not improve the framework-user's competitive outcomes. Han's survival is flat across all conditions. Robust null, high confidence. Second: at least two frameworks — tarot and scrambled — produce measurably different winner distributions, each favoring a specific non-Han state. Bonferroni-clean, high confidence. Third: the specific content of a philosophical tradition determines which ecosystem signature emerges. This follows from the established findings but requires the control-Yan and yarrow-Qin-suppression patterns to strengthen with additional data before all four signatures can be claimed with equal rigor.
The doubled fire of Hexagram 30 represents clarity applied to clarity — methodology examined, corrected, and reapplied. The first fire was the v2 board redesign that gave the oracle room to operate. The second fire was the campaign confound fix — eight re-run games — that gave the analysis clean data to work with. Together they illuminate what the research program spent twelve dispatches converging on: that ancient philosophical frameworks, placed inside computational agents, do not help those agents survive — but they reshape the world those agents inhabit, in ways that are specific, measurable, and distinct for each tradition.
Influence still manifests in the big toe. Han still dies. But the four quarters of the competitive landscape now glow with different light depending on which text the smallest state was given to read. The brightness has risen twice. The paper will carry what it illuminates.