Theorem 3

Normalised Lyapunov

A scale-invariant Lyapunov function that requires no weight tuning and achieves 0% violations where standard Lyapunov fails 3.9% of the time.

Theorem Statement

For a composed system with \( K \) objectives and loss values \( L_1, \ldots, L_K \), define the normalised Lyapunov function:

\[ V(\theta) = \sum_{i=1}^{K} \frac{L_i(\theta)}{L_i(\theta_0)} \]

where \( L_i(\theta_0) \) is the initial loss for objective \( i \). Then:

\( V(\theta) \geq 0 \) with \( V = 0 \) iff all losses reach zero
\( V(\theta_0) = K \) (dimensionless, independent of loss scales)
Under the cosine-scaled projection (Theorem 2), \( \dot{V} \leq 0 \) along the projected gradient flow

Proof Sketch

The standard Lyapunov function \( V_{\text{std}} = \sum w_i L_i \) requires hand-tuned weights \( w_i \) to balance objectives with different scales. If one loss is measured in units of \( 10^6 \) and another in units of \( 10^{-3} \), the standard Lyapunov is dominated by the larger-scale loss regardless of weights.

Normalisation by initial values converts each term to a dimensionless fraction: \( L_i / L_{i,0} \) represents "fraction of initial error remaining" for objective \( i \). All terms contribute equally regardless of their original scale.

The decrease condition \( \dot{V} \leq 0 \) follows from the cosine-scaled projection ensuring no conflicting gradient components survive (Theorem 2). Each projected gradient reduces its corresponding loss term, so each \( L_i / L_{i,0} \) is non-increasing. \( \blacksquare \)

Comparison with Standard Lyapunov

Property	Standard Lyapunov	Normalised Lyapunov
Weight tuning required	Yes (\( w_i \) per objective)	No (automatic)
Scale invariant	No	Yes
Violation rate	3.9%	0%
Interpretability	Weighted sum (units vary)	Fraction of initial error
Initial value	Depends on \( w_i \) and scales	Always \( K \)

Empirical Evidence

Test	Steps	Violations (Standard)
Multi-scale objectives	10,000	390 (3.9%)
Adversarial gradients	5,000	215 (4.3%)
High-dimensional (K = 20)	10,000	410 (4.1%)

Significance

No hyperparameters — the normalisation is determined entirely by initial conditions
Scale-invariant — works identically whether losses are in \( [0, 1] \) or \( [0, 10^9] \)
Interpretable — \( V = 3.2 \) out of \( K = 5 \) means the average objective has 64% of its initial error remaining
Composable — adding a new objective just adds another \( L_i / L_{i,0} \) term

Experiment Files

exp_lyapunov.sx — Normalised vs standard Lyapunov, violation counting, multi-scale tests
exp_lyapunov_refinement.sx — Robustness under adversarial gradients and high dimensionality

← Back to Theorems