Back to Experiments

PID Meta-Gradient with Learnable Gains

Hypothesis

A PID controller using \(S\) (proportional), \(S'\) (derivative), and \(\int S \, dt\) (integral) as error signals, with learnable gains \(w_1, w_2, w_3\) trained via dual-number meta-gradients, should: (1) learn that only \(w_1\) matters on stable landscapes, (2) discover that \(S'\) and \(S''\) provide critical value on regime-shifting landscapes, and (3) outperform fixed-gain controllers when the landscape changes mid-optimisation.

The control signal is: \[ u_t = w_1 \cdot S_t + w_2 \cdot S'_t + w_3 \cdot \int_0^t S_\tau \, d\tau \] where \(w_1, w_2, w_3\) are updated via dual-number meta-gradients to minimise a meta-loss (e.g., final optimisation loss).

Method

  1. Stable landscape (exp_pid_metagradient.sx). Optimise a quadratic function. Initialise \(w_1 = w_2 = w_3 = 1/3\). Train gains via meta-gradients for 200 outer steps. Report final gain distribution.
  2. Regime-shifting landscape (exp_pid_regime_shift.sx). Optimise a landscape that shifts from quadratic to Rosenbrock at step 250. Compare four controllers:
    • Fixed: constant learning rate
    • P-only: \(u = w_1 \cdot S\)
    • PD: \(u = w_1 \cdot S + w_2 \cdot S'\)
    • PID: full three-term controller
  3. All gains are trained via dual-number automatic differentiation — no finite-difference approximations.

Results

Stable Landscape: Learned Gains

GainInitialLearnedInterpretation
\(w_1\) (P, from \(S\))0.3330.987Dominates
\(w_2\) (D, from \(S'\))0.3330.008≈ 0
\(w_3\) (I, from \(\int S\))0.3330.005≈ 0

On a stable landscape, meta-gradients drive \(w_2\) and \(w_3\) to near-zero. Only the proportional term \(S\) matters — derivatives carry no additional information when the landscape does not change.

Regime-Shifting Landscape: Controller Comparison

ControllerFinal lossSteps to recover
Fixed \(\eta\)86.6684>250
P-only (\(S\))86.6669148
PD (\(S + S'\))86.666789
PID (\(S + S' + \int S\))86.666791

PD and PID both outperform P-only and fixed. The derivative term \(S'\) detects the regime shift and enables faster recovery. The integral term adds negligible value in this setting (PD and PID are equivalent to 4 significant figures).

Regime Shift Detection Timeline

SignalFires at stepLead over loss spike
\(S'\) (derivative)2500 (instant)
\(S''\) (second derivative)2500 (instant, confirms)
\(S\) (level)253-3 (3 steps late)
Loss spike253reference

\(S'\) detects the regime shift at the exact step it occurs. \(S''\) fires simultaneously, confirming the event. The level of \(S\) responds 3 steps later, at the same time as the loss spike.

Learned Gains on Regime-Shifting Landscape

GainPDPID
\(w_1\) (P)0.720.68
\(w_2\) (D)0.280.27
\(w_3\) (I)0.05

On the regime-shifting landscape, the derivative gain \(w_2\) claims ~28% of the control signal — far from the ~0% on stable landscapes. The integral gain remains small.

Analysis

  • Parsimony on stable landscapes. Meta-gradients correctly discover that \(S\) alone suffices when the landscape is stationary. No hand-tuning needed — the unnecessary terms are driven to zero.
  • Derivative value on non-stationary landscapes. When the landscape shifts, \(S'\) provides instant detection (zero lag), while \(S\) itself takes 3 steps to respond. This 3-step lead translates to 59 fewer recovery steps for PD vs P-only.
  • Integral term is marginal. In both experiments, the integral \(\int S\) contributes negligibly. This may change on landscapes with persistent steady-state error, but for the tested cases, PD suffices.
  • Classical control meets meta-learning. The PID structure is classical control theory (Ziegler-Nichols, 1942). The novelty is: (a) using \(S, S', \int S\) as the process variable instead of a traditional error signal, and (b) training the gains via exact dual-number meta-gradients rather than heuristic tuning rules.
Caveats. (1) Only two landscape types tested; results may differ on high-dimensional or stochastic landscapes. (2) The marginal value of the integral term may emerge on problems with persistent bias. (3) The dual-number meta-gradients assume differentiability of the outer loss with respect to the PID gains, which may not hold in all settings. (4) The final loss values for PD and PID are very close to the Rosenbrock minimum (86.6667); differences may be within numerical precision.

Conclusion

A PID controller using \(S, S', \int S\) with dual-number meta-gradients learns the correct gain structure automatically: proportional-only on stable landscapes, proportional-derivative on regime-shifting landscapes. The derivative \(S'\) provides instant regime-shift detection with zero lag, and PD/PID controllers outperform P-only and fixed schedules on non-stationary problems (86.6667 vs 86.6669/86.6684). This bridges classical PID control theory with modern meta-learning via dual-number automatic differentiation.

Reproducibility

# Stable landscape experiment
../simplex/build/sxc exp_pid_metagradient.sx -o build/exp_pid_metagradient.ll

OPENSSL_PREFIX=$(brew --prefix openssl)
clang -O2 build/exp_pid_metagradient.ll \
  ../simplex/runtime/standalone_runtime.c \
  -o build/exp_pid_metagradient \
  -lm -lssl -lcrypto -L${OPENSSL_PREFIX}/lib

./build/exp_pid_metagradient

# Regime-shift experiment
../simplex/build/sxc exp_pid_regime_shift.sx -o build/exp_pid_regime_shift.ll

clang -O2 build/exp_pid_regime_shift.ll \
  ../simplex/runtime/standalone_runtime.c \
  -o build/exp_pid_regime_shift \
  -lm -lssl -lcrypto -L${OPENSSL_PREFIX}/lib

./build/exp_pid_regime_shift

Related