Theorem 3

Normalised Lyapunov

Theorem Statement

For a composed system with \( K \) objectives and loss values \( L_1, \ldots, L_K \), define the normalised Lyapunov function:

\[ V(\theta) = \sum_{i=1}^{K} \frac{L_i(\theta)}{L_i(\theta_0)} \]

where \( L_i(\theta_0) \) is the initial loss for objective \( i \). Then:

  1. \( V(\theta) \geq 0 \) with \( V = 0 \) iff all losses reach zero
  2. \( V(\theta_0) = K \) (dimensionless, independent of loss scales)
  3. Under the cosine-scaled projection (Theorem 2), \( \dot{V} \leq 0 \) along the projected gradient flow

Proof Sketch

The standard Lyapunov function \( V_{\text{std}} = \sum w_i L_i \) requires hand-tuned weights \( w_i \) to balance objectives with different scales. If one loss is measured in units of \( 10^6 \) and another in units of \( 10^{-3} \), the standard Lyapunov is dominated by the larger-scale loss regardless of weights.

Normalisation by initial values converts each term to a dimensionless fraction: \( L_i / L_{i,0} \) represents "fraction of initial error remaining" for objective \( i \). All terms contribute equally regardless of their original scale.

The decrease condition \( \dot{V} \leq 0 \) follows from the cosine-scaled projection ensuring no conflicting gradient components survive (Theorem 2). Each projected gradient reduces its corresponding loss term, so each \( L_i / L_{i,0} \) is non-increasing. \( \blacksquare \)

Comparison with Standard Lyapunov

PropertyStandard LyapunovNormalised Lyapunov
Weight tuning requiredYes (\( w_i \) per objective)No (automatic)
Scale invariantNoYes
Violation rate3.9%0%
InterpretabilityWeighted sum (units vary)Fraction of initial error
Initial valueDepends on \( w_i \) and scalesAlways \( K \)

Empirical Evidence

TestStepsViolations (Standard)Violations (Normalised)
Multi-scale objectives10,000390 (3.9%)0 (0%)
Adversarial gradients5,000215 (4.3%)0 (0%)
High-dimensional (K = 20)10,000410 (4.1%)0 (0%)

Significance

  • No hyperparameters — the normalisation is determined entirely by initial conditions
  • Scale-invariant — works identically whether losses are in \( [0, 1] \) or \( [0, 10^9] \)
  • Interpretable — \( V = 3.2 \) out of \( K = 5 \) means the average objective has 64% of its initial error remaining
  • Composable — adding a new objective just adds another \( L_i / L_{i,0} \) term

Experiment Files

exp_lyapunov.sx — Normalised vs standard Lyapunov, violation counting, multi-scale tests
exp_lyapunov_refinement.sx — Robustness under adversarial gradients and high dimensionality