Back to Experiments

Nash Equilibrium and Skeptical Desire

Hypothesis

In multi-agent games, the unified adaptation framework converges to Nash equilibrium in zero-sum games and, when augmented with skeptical desire regularisation, escapes pure Nash equilibria to achieve Pareto-superior outcomes in cooperative games like the Prisoner's Dilemma.

Method

  1. Zero-sum game: 2-player matching pennies. Payoff matrix \(\begin{pmatrix}1 & -1 \\ -1 & 1\end{pmatrix}\). Run gradient dynamics with cosine-scaled projection for 1000 steps.
  2. 3-player coordination: Each player chooses strategy \(s_i \in [0, 1]\). Payoff is \(u_i = -|s_i - \bar{s}| + \epsilon_i\) where \(\bar{s}\) is the mean. Run with and without interaction matrix discovery.
  3. Prisoner's Dilemma: Standard payoff matrix (T=5, R=3, P=1, S=0). Run three variants:
    • Pure Nash (no regularisation)
    • Standard desire regularisation
    • Skeptical desire regularisation
  4. Measure: equilibrium strategy, social welfare, Pareto efficiency, convergence time.

Results

Zero-Sum Game

MethodPlayer 1 StrategyPlayer 2 StrategyExploitabilitySteps to Converge
Gradient dynamics0.50 ± 0.120.50 ± 0.110.24Did not converge
With cosine projection0.500 ± 0.0030.500 ± 0.0030.006142

Cosine projection converges to the mixed Nash equilibrium (0.5, 0.5). Standard gradient dynamics oscillates.

3-Player Coordination

MethodStrategy VarianceMean PayoffConvergence
Independent0.0820.71340 steps
Interaction matrix0.0110.9487 steps

Prisoner's Dilemma

MethodP(Cooperate)Social WelfarePareto %Nash Welfare
Pure Nash0.002.000.0%2.00
Standard desire0.413.2852.1%2.00
Skeptical desire0.725.3483.5%2.00

Pareto-optimal welfare = 6.0 (mutual cooperation). Nash welfare = 2.0 (mutual defection). Skeptical desire achieves welfare 5.34, or 83.5% of the Pareto front.

Convergence Dynamics

MethodSteps to EquilibriumOscillation AmplitudeFinal \(S\)
Pure Nash450.0001.000
Standard desire2800.0310.871
Skeptical desire4100.0080.934

Analysis

  • In zero-sum games, cosine-scaled projection converges to the unique Nash equilibrium. This is expected: Theorem 11 guarantees convergence when the game has a saddle point.
  • The 3-player coordination game benefits from interaction matrix discovery (7.4x variance reduction, 3.9x faster convergence), confirming that Theorem 4's topology discovery applies to game-theoretic settings.
  • Skeptical desire in the Prisoner's Dilemma works by adding a regularisation term that penalises strategies too close to the Nash equilibrium when the welfare gap is large. The skeptic "doubts" that mutual defection is truly optimal and explores cooperative strategies.
  • The 83.5% Pareto efficiency is not 100% because the mechanism does not eliminate the temptation payoff entirely; some defection risk remains.

Conclusion

Theorem 11 is validated. The framework converges to Nash in zero-sum settings and, with skeptical desire regularisation, escapes inefficient Nash equilibria in cooperative games to achieve near-Pareto outcomes (83.5% efficiency, welfare 5.34 vs Nash 2.0).

Reproducibility

../simplex/build/sxc exp_nash_equilibrium.sx -o build/exp_nash_equilibrium.ll

OPENSSL_PREFIX=$(brew --prefix openssl)
clang -O2 build/exp_nash_equilibrium.ll \
  ../simplex/runtime/standalone_runtime.c \
  -o build/exp_nash_equilibrium \
  -lm -lssl -lcrypto -L${OPENSSL_PREFIX}/lib

./build/exp_nash_equilibrium

Related Theorems