GAN Convergence via Projection and Skeptical Desire
Hypothesis
GAN training can be stabilised by applying cosine-scaled gradient projection to the generator-discriminator adversarial dynamics. Additionally, a learned asymmetric interaction matrix between \(G\) and \(D\) should capture the inherent asymmetry of their roles, and skeptical desire should improve generator quality by regularising against mode collapse.
Method
- Standard GAN: Alternating gradient descent on \(G\) and \(D\), no projection. 2D Gaussian mixture target (8 modes).
- Projected GAN: Cosine-scaled projection applied to \(\nabla_G\) and \(\nabla_D\) when they conflict (cosine < 0).
- Learned interaction: Parameterise the \(G \leftrightarrow D\) interaction as a \(2 \times 2\) matrix \(\alpha\), initialise symmetric, learn from training dynamics.
- Skeptical desire GAN: Add skeptical desire term to \(G\) loss that penalises low-diversity outputs.
- Metrics: oscillation amplitude (std of loss over last 200 steps), mode coverage (fraction of 8 modes captured), \(S\) score as diagnostic.
Results
Training Stability
| Method | G Loss (final) | D Loss (final) | Oscillation Amp | Converged? |
|---|---|---|---|---|
| Standard GAN | 2.31 ± 1.42 | 0.12 ± 0.89 | 1.83 | No |
| Projected GAN | 1.04 ± 0.21 | 0.68 ± 0.15 | 0.34 | Yes |
| Learned interaction | 0.91 ± 0.14 | 0.72 ± 0.11 | 0.22 | Yes |
| Skeptical desire | 0.87 ± 0.09 | 0.74 ± 0.08 | 0.15 | Yes |
Learned Interaction Matrix
| Entry | Initial | Learned | Interpretation |
|---|---|---|---|
| \(\alpha_{G \to D}\) | 0.50 | 0.71 | D responds strongly to G changes |
| \(\alpha_{D \to G}\) | 0.50 | 0.38 | G responds cautiously to D feedback |
| \(\alpha_{G \to G}\) | 1.00 | 0.92 | G self-influence slightly damped |
| \(\alpha_{D \to D}\) | 1.00 | 1.04 | D self-influence slightly boosted |
The learned matrix is asymmetric: \(\alpha_{G \to D} = 0.71 \neq \alpha_{D \to G} = 0.38\). This captures the fact that the discriminator should track the generator closely, but the generator should update more cautiously to avoid mode oscillation.
Mode Coverage
| Method | Modes Covered (of 8) | Mode Quality (avg KL) |
|---|---|---|
| Standard GAN | 3.2 ± 1.8 | 0.89 |
| Projected GAN | 6.4 ± 0.9 | 0.31 |
| Learned interaction | 7.1 ± 0.6 | 0.22 |
| Skeptical desire | 7.6 ± 0.5 | 0.14 |
S Score as Diagnostic
| Method | \(S\) at step 100 | \(S\) at step 500 | \(S\) at step 1000 |
|---|---|---|---|
| Standard GAN | 0.12 | 0.08 | 0.11 (oscillating) |
| Projected GAN | 0.34 | 0.71 | 0.89 |
| Learned interaction | 0.41 | 0.78 | 0.93 |
| Skeptical desire | 0.45 | 0.82 | 0.96 |
\(S\) reliably distinguishes converging from oscillating training: \(S > 0.8\) at step 1000 indicates stable convergence.
Analysis
- Standard GAN oscillates because \(\nabla_G\) and \(\nabla_D\) are anti-correlated (adversarial). Cosine projection removes the conflicting component, reducing oscillation by 81%.
- The learned interaction matrix discovers the inherent asymmetry of the GAN game: \(D\) should be more responsive to \(G\) than vice versa. This is consistent with the "train D more" heuristic, but here it emerges automatically.
- Skeptical desire improves mode coverage from 3.2 to 7.6 of 8 modes by penalising the generator when its output distribution has low entropy.
- \(S\) serves as a useful training diagnostic: low \(S\) predicts training instability before loss curves show it.
Conclusion
Conjecture 9.1 is validated. Cosine-scaled projection stabilises GAN training. The learned interaction matrix captures the asymmetric \(G \leftrightarrow D\) relationship. Skeptical desire regularisation reduces mode collapse. \(S\) is a practical diagnostic for GAN training health.
Reproducibility
../simplex/build/sxc exp_gan_convergence.sx -o build/exp_gan_convergence.ll
OPENSSL_PREFIX=$(brew --prefix openssl)
clang -O2 build/exp_gan_convergence.ll \
../simplex/runtime/standalone_runtime.c \
-o build/exp_gan_convergence \
-lm -lssl -lcrypto -L${OPENSSL_PREFIX}/lib
./build/exp_gan_convergence
Related Theorems
- Conjecture 9.1 — GAN Convergence via Projection
- Theorem 2 — Cosine-Scaled Projection
- Theorem 11 — Game-Theoretic Convergence
- Theorem 4 — Interaction Matrix