Experiment: Skeptical Annealing
Hypothesis
Conjecture 6.3 predicted that skepticism (misaligned desire) is most beneficial early in learning and should be annealed over time. The alternative hypothesis (Conjecture 6.5) predicts that sustained skepticism acts as a permanent regulariser. This experiment tests both predictions by comparing three strategies: annealed skepticism, fixed skepticism, and fixed alignment.
Method
Three-agent ensemble observing a stationary Bernoulli stream (\(p = 0.7\)) for up to 200 observations. Agent configurations:
- Fixed-skeptical: desire coupling \(c = -0.5\) throughout
- Fixed-aligned: desire coupling \(c = +0.5\) throughout
- Annealed: starts at \(c = -0.5\), linearly anneals to \(c = +0.5\) over the observation window
Ensemble combines all three via learned weights. Calibration measured as squared Brier score at multiple horizons.
Experiment 1: Strategy Comparison
Head-to-head comparison of all three strategies at the full 200-observation horizon.
| Strategy | Final Calibration (Brier) | Rank |
|---|---|---|
| Annealed (\(c: -0.5 \to +0.5\)) | 0.00146 | 1st (best) |
| Fixed-skeptical (\(c = -0.5\)) | 0.00163 | 2nd |
| Fixed-aligned (\(c = +0.5\)) | 0.00168 | 3rd (worst) |
Result 1
The annealed strategy achieves the best overall calibration (0.00146), outperforming fixed-skeptical by 10.4% and fixed-aligned by 13.1%. However, the critical question is whether the skeptic loses its advantage at longer horizons (Conjecture 6.3) or maintains it (Conjecture 6.5).
Experiment 2: Skeptic Advantage Across Horizons
Comparing fixed-skeptical versus fixed-aligned at three horizons: 20, 80, and 200 observations.
| Horizon | Skeptical Brier | Aligned Brier | Skeptic Wins? |
|---|---|---|---|
| 20 observations | 0.00891 | 0.00947 | Yes (5.9% better) |
| 80 observations | 0.00342 | 0.00371 | Yes (7.8% better) |
| 200 observations | 0.00163 | 0.00168 | Yes (3.0% better) |
Conjecture 6.3: Refuted
The skeptic wins at all horizons — 20, 80, and 200 observations. Conjecture 6.3 predicted the skeptic would lose its advantage as \(n \to \infty\). Instead, the advantage narrows but never reverses. Skepticism is not a temporary exploration heuristic; it is a structural advantage.
Experiment 3: Ensemble Weight Distribution
The ensemble learns how to weight the three strategies via meta-gradient. If annealing were optimal, the ensemble should converge to weight 1.0 on the annealed agent.
| Agent | Learned Weight |
|---|---|
| Annealed | 0.36 |
| Fixed-skeptical | 0.34 |
| Fixed-aligned | 0.30 |
Result 3
Ensemble weights are roughly equal (0.36 / 0.34 / 0.30), with a slight preference for the annealed and skeptical agents. The meta-gradient does not strongly favour any single strategy, suggesting that diversity itself — maintaining multiple viewpoints — is the primary source of ensemble calibration improvement.
Analysis
This experiment produced a refutation and a validation:
- Conjecture 6.3 (Refuted): Skepticism was predicted to help only early. The data shows the skeptic wins at all tested horizons. The advantage narrows from 5.9% at \(n=20\) to 3.0% at \(n=200\), but never crosses zero. This is consistent with the regularisation interpretation: as \(n\) grows, the bias-variance tradeoff shifts but the variance reduction from skepticism always exceeds the bias it introduces.
- Conjecture 6.5 (Validated): Sustained skepticism is permanently beneficial. The fixed-skeptical agent never falls behind the fixed-aligned agent at any horizon.
The annealed strategy wins overall not because annealing is optimal, but because it captures diversity — the agent passes through multiple coupling regimes, effectively averaging over them. This led directly to the proof of Theorem 7.
Conclusion
Conjecture 6.3 refuted: the skeptic wins always, not just early. Conjecture 6.5 validated: sustained skepticism is a permanent Bayesian regulariser. Annealed strategy achieves best single-agent calibration (0.00146) but ensemble weights suggest diversity matters more than any single schedule. These results strengthened the theory by motivating Theorem 7.
Reproducibility
# Clone and build
git clone https://github.com/senuamedia/lab.git
cd simplex && ./build.sh && cd ..
# Clone theorem-proof
git clone https://github.com/senuamedia/theorem-proof.git
cd theorem-proof
# Compile
../simplex/build/sxc exp_skeptical_annealing.sx -o build/exp_skeptical_annealing.ll
# Link with runtime
OPENSSL_PREFIX=$(brew --prefix openssl)
clang -O2 build/exp_skeptical_annealing.ll \
../simplex/runtime/standalone_runtime.c \
-I"$OPENSSL_PREFIX/include" \
-L"$OPENSSL_PREFIX/lib" \
-lssl -lcrypto -lm \
-o build/exp_skeptical_annealing
# Run
./build/exp_skeptical_annealing
Related Theorems
- Theorem 7: Desire as Bayesian Regulariser — motivated by this refutation
- Conjecture 6.3: Skepticism Annealing — refuted
- Conjecture 6.5: Sustained Skepticism — validated
- Experiment: Deep Anima Beliefs — single-agent desire coupling
- Experiment: Correlated Beliefs — desire sweep data