Normalised Lyapunov
Theorem Statement
For a composed system with \( K \) objectives and loss values \( L_1, \ldots, L_K \), define the normalised Lyapunov function:
\[ V(\theta) = \sum_{i=1}^{K} \frac{L_i(\theta)}{L_i(\theta_0)} \]where \( L_i(\theta_0) \) is the initial loss for objective \( i \). Then:
- \( V(\theta) \geq 0 \) with \( V = 0 \) iff all losses reach zero
- \( V(\theta_0) = K \) (dimensionless, independent of loss scales)
- Under the cosine-scaled projection (Theorem 2), \( \dot{V} \leq 0 \) along the projected gradient flow
Proof Sketch
The standard Lyapunov function \( V_{\text{std}} = \sum w_i L_i \) requires hand-tuned weights \( w_i \) to balance objectives with different scales. If one loss is measured in units of \( 10^6 \) and another in units of \( 10^{-3} \), the standard Lyapunov is dominated by the larger-scale loss regardless of weights.
Normalisation by initial values converts each term to a dimensionless fraction: \( L_i / L_{i,0} \) represents "fraction of initial error remaining" for objective \( i \). All terms contribute equally regardless of their original scale.
The decrease condition \( \dot{V} \leq 0 \) follows from the cosine-scaled projection ensuring no conflicting gradient components survive (Theorem 2). Each projected gradient reduces its corresponding loss term, so each \( L_i / L_{i,0} \) is non-increasing. \( \blacksquare \)
Comparison with Standard Lyapunov
| Property | Standard Lyapunov | Normalised Lyapunov |
|---|---|---|
| Weight tuning required | Yes (\( w_i \) per objective) | No (automatic) |
| Scale invariant | No | Yes |
| Violation rate | 3.9% | 0% |
| Interpretability | Weighted sum (units vary) | Fraction of initial error |
| Initial value | Depends on \( w_i \) and scales | Always \( K \) |
Empirical Evidence
| Test | Steps | Violations (Standard) | Violations (Normalised) |
|---|---|---|---|
| Multi-scale objectives | 10,000 | 390 (3.9%) | 0 (0%) |
| Adversarial gradients | 5,000 | 215 (4.3%) | 0 (0%) |
| High-dimensional (K = 20) | 10,000 | 410 (4.1%) | 0 (0%) |
Significance
- No hyperparameters — the normalisation is determined entirely by initial conditions
- Scale-invariant — works identically whether losses are in \( [0, 1] \) or \( [0, 10^9] \)
- Interpretable — \( V = 3.2 \) out of \( K = 5 \) means the average objective has 64% of its initial error remaining
- Composable — adding a new objective just adds another \( L_i / L_{i,0} \) term
Experiment Files
exp_lyapunov.sx — Normalised vs standard Lyapunov, violation counting, multi-scale tests
exp_lyapunov_refinement.sx — Robustness under adversarial gradients and high dimensionality