PID Meta-Gradient with Learnable Gains

Experiments: exp_pid_metagradient.sx, exp_pid_regime_shift.sx | Domain: Control Theory & Meta-Learning

Hypothesis

A PID controller using \(S\) (proportional), \(S'\) (derivative), and \(\int S \, dt\) (integral) as error signals, with learnable gains \(w_1, w_2, w_3\) trained via dual-number meta-gradients, should: (1) learn that only \(w_1\) matters on stable landscapes, (2) discover that \(S'\) and \(S''\) provide critical value on regime-shifting landscapes, and (3) outperform fixed-gain controllers when the landscape changes mid-optimisation.

The control signal is: \[ u_t = w_1 \cdot S_t + w_2 \cdot S'_t + w_3 \cdot \int_0^t S_\tau \, d\tau \] where \(w_1, w_2, w_3\) are updated via dual-number meta-gradients to minimise a meta-loss (e.g., final optimisation loss).

Method

Stable landscape (exp_pid_metagradient.sx). Optimise a quadratic function. Initialise \(w_1 = w_2 = w_3 = 1/3\). Train gains via meta-gradients for 200 outer steps. Report final gain distribution.
Regime-shifting landscape (exp_pid_regime_shift.sx). Optimise a landscape that shifts from quadratic to Rosenbrock at step 250. Compare four controllers:
- Fixed: constant learning rate
- P-only: \(u = w_1 \cdot S\)
- PD: \(u = w_1 \cdot S + w_2 \cdot S'\)
- PID: full three-term controller
All gains are trained via dual-number automatic differentiation — no finite-difference approximations.

Results

Stable Landscape: Learned Gains

Gain	Initial	Learned	Interpretation
\(w_1\) (P, from \(S\))	0.333	0.987	Dominates
\(w_2\) (D, from \(S'\))	0.333	0.008	≈ 0
\(w_3\) (I, from \(\int S\))	0.333	0.005	≈ 0

On a stable landscape, meta-gradients drive \(w_2\) and \(w_3\) to near-zero. Only the proportional term \(S\) matters — derivatives carry no additional information when the landscape does not change.

Regime-Shifting Landscape: Controller Comparison

Controller	Final loss	Steps to recover
Fixed \(\eta\)	86.6684	>250
P-only (\(S\))	86.6669	148
PD (\(S + S'\))	86.6667	89
PID (\(S + S' + \int S\))	86.6667	91

PD and PID both outperform P-only and fixed. The derivative term \(S'\) detects the regime shift and enables faster recovery. The integral term adds negligible value in this setting (PD and PID are equivalent to 4 significant figures).

Regime Shift Detection Timeline

Signal	Fires at step	Lead over loss spike
\(S'\) (derivative)	250	0 (instant)
\(S''\) (second derivative)	250	0 (instant, confirms)
\(S\) (level)	253	-3 (3 steps late)
Loss spike	253	reference

\(S'\) detects the regime shift at the exact step it occurs. \(S''\) fires simultaneously, confirming the event. The level of \(S\) responds 3 steps later, at the same time as the loss spike.

Learned Gains on Regime-Shifting Landscape

Gain	PD	PID
\(w_1\) (P)	0.72	0.68
\(w_2\) (D)	0.28	0.27
\(w_3\) (I)	—	0.05

On the regime-shifting landscape, the derivative gain \(w_2\) claims ~28% of the control signal — far from the ~0% on stable landscapes. The integral gain remains small.

Analysis

Parsimony on stable landscapes. Meta-gradients correctly discover that \(S\) alone suffices when the landscape is stationary. No hand-tuning needed — the unnecessary terms are driven to zero.
Derivative value on non-stationary landscapes. When the landscape shifts, \(S'\) provides instant detection (zero lag), while \(S\) itself takes 3 steps to respond. This 3-step lead translates to 59 fewer recovery steps for PD vs P-only.
Integral term is marginal. In both experiments, the integral \(\int S\) contributes negligibly. This may change on landscapes with persistent steady-state error, but for the tested cases, PD suffices.
Classical control meets meta-learning. The PID structure is classical control theory (Ziegler-Nichols, 1942). The novelty is: (a) using \(S, S', \int S\) as the process variable instead of a traditional error signal, and (b) training the gains via exact dual-number meta-gradients rather than heuristic tuning rules.

Caveats. (1) Only two landscape types tested; results may differ on high-dimensional or stochastic landscapes. (2) The marginal value of the integral term may emerge on problems with persistent bias. (3) The dual-number meta-gradients assume differentiability of the outer loss with respect to the PID gains, which may not hold in all settings. (4) The final loss values for PD and PID are very close to the Rosenbrock minimum (86.6667); differences may be within numerical precision.

Conclusion

A PID controller using \(S, S', \int S\) with dual-number meta-gradients learns the correct gain structure automatically: proportional-only on stable landscapes, proportional-derivative on regime-shifting landscapes. The derivative \(S'\) provides instant regime-shift detection with zero lag, and PD/PID controllers outperform P-only and fixed schedules on non-stationary problems (86.6667 vs 86.6669/86.6684). This bridges classical PID control theory with modern meta-learning via dual-number automatic differentiation.

Reproducibility

# Stable landscape experiment
../simplex/build/sxc exp_pid_metagradient.sx -o build/exp_pid_metagradient.ll

OPENSSL_PREFIX=$(brew --prefix openssl)
clang -O2 build/exp_pid_metagradient.ll \
  ../simplex/runtime/standalone_runtime.c \
  -o build/exp_pid_metagradient \
  -lm -lssl -lcrypto -L${OPENSSL_PREFIX}/lib

./build/exp_pid_metagradient

# Regime-shift experiment
../simplex/build/sxc exp_pid_regime_shift.sx -o build/exp_pid_regime_shift.ll

clang -O2 build/exp_pid_regime_shift.ll \
  ../simplex/runtime/standalone_runtime.c \
  -o build/exp_pid_regime_shift \
  -lm -lssl -lcrypto -L${OPENSSL_PREFIX}/lib

./build/exp_pid_regime_shift

S as Adaptive Control Signal — S-controlled learning rate
Predictive S — information content of S and S'
S-Entropy Connection — S vs Shannon entropy
I-Ratio Theorem — \(I = -0.5\) iff equilibrium
Ziegler, J. G. & Nichols, N. B. (1942) — Optimum settings for automatic controllers

PID Meta-Gradient with Learnable Gains

Hypothesis

Method

Results

Stable Landscape: Learned Gains

Regime-Shifting Landscape: Controller Comparison

Regime Shift Detection Timeline

Learned Gains on Regime-Shifting Landscape

Analysis

Conclusion

Reproducibility

Related