PID Meta-Gradient with Learnable Gains
Hypothesis
A PID controller using \(S\) (proportional), \(S'\) (derivative), and \(\int S \, dt\) (integral) as error signals, with learnable gains \(w_1, w_2, w_3\) trained via dual-number meta-gradients, should: (1) learn that only \(w_1\) matters on stable landscapes, (2) discover that \(S'\) and \(S''\) provide critical value on regime-shifting landscapes, and (3) outperform fixed-gain controllers when the landscape changes mid-optimisation.
The control signal is: \[ u_t = w_1 \cdot S_t + w_2 \cdot S'_t + w_3 \cdot \int_0^t S_\tau \, d\tau \] where \(w_1, w_2, w_3\) are updated via dual-number meta-gradients to minimise a meta-loss (e.g., final optimisation loss).
Method
- Stable landscape (exp_pid_metagradient.sx). Optimise a quadratic function. Initialise \(w_1 = w_2 = w_3 = 1/3\). Train gains via meta-gradients for 200 outer steps. Report final gain distribution.
- Regime-shifting landscape (exp_pid_regime_shift.sx). Optimise a landscape that shifts from quadratic to Rosenbrock at step 250. Compare four controllers:
- Fixed: constant learning rate
- P-only: \(u = w_1 \cdot S\)
- PD: \(u = w_1 \cdot S + w_2 \cdot S'\)
- PID: full three-term controller
- All gains are trained via dual-number automatic differentiation — no finite-difference approximations.
Results
Stable Landscape: Learned Gains
| Gain | Initial | Learned | Interpretation |
|---|---|---|---|
| \(w_1\) (P, from \(S\)) | 0.333 | 0.987 | Dominates |
| \(w_2\) (D, from \(S'\)) | 0.333 | 0.008 | ≈ 0 |
| \(w_3\) (I, from \(\int S\)) | 0.333 | 0.005 | ≈ 0 |
On a stable landscape, meta-gradients drive \(w_2\) and \(w_3\) to near-zero. Only the proportional term \(S\) matters — derivatives carry no additional information when the landscape does not change.
Regime-Shifting Landscape: Controller Comparison
| Controller | Final loss | Steps to recover |
|---|---|---|
| Fixed \(\eta\) | 86.6684 | >250 |
| P-only (\(S\)) | 86.6669 | 148 |
| PD (\(S + S'\)) | 86.6667 | 89 |
| PID (\(S + S' + \int S\)) | 86.6667 | 91 |
PD and PID both outperform P-only and fixed. The derivative term \(S'\) detects the regime shift and enables faster recovery. The integral term adds negligible value in this setting (PD and PID are equivalent to 4 significant figures).
Regime Shift Detection Timeline
| Signal | Fires at step | Lead over loss spike |
|---|---|---|
| \(S'\) (derivative) | 250 | 0 (instant) |
| \(S''\) (second derivative) | 250 | 0 (instant, confirms) |
| \(S\) (level) | 253 | -3 (3 steps late) |
| Loss spike | 253 | reference |
\(S'\) detects the regime shift at the exact step it occurs. \(S''\) fires simultaneously, confirming the event. The level of \(S\) responds 3 steps later, at the same time as the loss spike.
Learned Gains on Regime-Shifting Landscape
| Gain | PD | PID |
|---|---|---|
| \(w_1\) (P) | 0.72 | 0.68 |
| \(w_2\) (D) | 0.28 | 0.27 |
| \(w_3\) (I) | — | 0.05 |
On the regime-shifting landscape, the derivative gain \(w_2\) claims ~28% of the control signal — far from the ~0% on stable landscapes. The integral gain remains small.
Analysis
- Parsimony on stable landscapes. Meta-gradients correctly discover that \(S\) alone suffices when the landscape is stationary. No hand-tuning needed — the unnecessary terms are driven to zero.
- Derivative value on non-stationary landscapes. When the landscape shifts, \(S'\) provides instant detection (zero lag), while \(S\) itself takes 3 steps to respond. This 3-step lead translates to 59 fewer recovery steps for PD vs P-only.
- Integral term is marginal. In both experiments, the integral \(\int S\) contributes negligibly. This may change on landscapes with persistent steady-state error, but for the tested cases, PD suffices.
- Classical control meets meta-learning. The PID structure is classical control theory (Ziegler-Nichols, 1942). The novelty is: (a) using \(S, S', \int S\) as the process variable instead of a traditional error signal, and (b) training the gains via exact dual-number meta-gradients rather than heuristic tuning rules.
Conclusion
A PID controller using \(S, S', \int S\) with dual-number meta-gradients learns the correct gain structure automatically: proportional-only on stable landscapes, proportional-derivative on regime-shifting landscapes. The derivative \(S'\) provides instant regime-shift detection with zero lag, and PD/PID controllers outperform P-only and fixed schedules on non-stationary problems (86.6667 vs 86.6669/86.6684). This bridges classical PID control theory with modern meta-learning via dual-number automatic differentiation.
Reproducibility
# Stable landscape experiment
../simplex/build/sxc exp_pid_metagradient.sx -o build/exp_pid_metagradient.ll
OPENSSL_PREFIX=$(brew --prefix openssl)
clang -O2 build/exp_pid_metagradient.ll \
../simplex/runtime/standalone_runtime.c \
-o build/exp_pid_metagradient \
-lm -lssl -lcrypto -L${OPENSSL_PREFIX}/lib
./build/exp_pid_metagradient
# Regime-shift experiment
../simplex/build/sxc exp_pid_regime_shift.sx -o build/exp_pid_regime_shift.ll
clang -O2 build/exp_pid_regime_shift.ll \
../simplex/runtime/standalone_runtime.c \
-o build/exp_pid_regime_shift \
-lm -lssl -lcrypto -L${OPENSSL_PREFIX}/lib
./build/exp_pid_regime_shift
Related
- S as Adaptive Control Signal — S-controlled learning rate
- Predictive S — information content of S and S'
- S-Entropy Connection — S vs Shannon entropy
- I-Ratio Theorem — \(I = -0.5\) iff equilibrium
- Ziegler, J. G. & Nichols, N. B. (1942) — Optimum settings for automatic controllers