Balance Residual (B-Flow) Convergence

Hypothesis

Gradient descent on the balance residual \( B(\theta) \) converges to equilibrium with dramatically higher precision than gradient descent on the loss. The balance residual is defined as:

\[ B(\theta) = \frac{\left\|\sum_i g_i(\theta)\right\|^2}{\sum_i \|g_i(\theta)\|^2} \]

At equilibrium \( B = 0 \) (gradients cancel). B-flow descends on \( B \) directly rather than on individual losses.

Method

Setup: Three test problems of increasing complexity: 1D convex, 2D convex, and non-convex.

Parameters:

  • 1D: 2 quadratic losses with different minima
  • 2D: 2 quadratic losses in 2-dimensional parameter space
  • Non-convex: losses with multiple local minima
  • Steps: 1000 per method per problem

Procedure: For each problem, run B-flow (gradient descent on \( B \)) and loss-flow (gradient descent on \( \sum L_i \)) side by side. Compare the final balance residual achieved by each method.

Results

1D Problem

MethodFinal B
B-flow\(0\) (exact)
Loss-flow\(> 0\)

2D Problem

MethodFinal B
B-flow\(4.7 \times 10^{-34}\)
Loss-flow\(2.3 \times 10^{-4}\)

Precision ratio: B-flow is \( \approx 4.9 \times 10^{29} \) times more precise than loss-flow on the 2D problem.

Non-convex Problem

MethodOutcome
B-flow aloneMay converge to spurious B=0 (saddle point)
Loss-flow aloneFinds lower-loss region but imprecise equilibrium
Two-phase (loss then B)Best of both: reaches correct basin, then refines to B ≈ 0

Analysis

  • 1D exact (B=0): In the 1D convex case, B-flow finds the exact Pareto-optimal point where gradients cancel. The loss-flow settles for a compromise that does not fully balance the gradients.
  • 2D near-exact (\(4.7 \times 10^{-34}\)): The 2D result is 30 orders of magnitude more precise than loss-flow's \(2.3 \times 10^{-4}\). This is because B-flow's gradient \( \nabla B \) points directly toward gradient cancellation, while loss-flow's gradient \( \nabla \sum L_i \) only incidentally approaches equilibrium.
  • Non-convex two-phase: B-flow's weakness is that \( B = 0 \) can occur at saddle points or undesirable equilibria in non-convex landscapes. The two-phase strategy uses loss-flow to navigate to the correct basin, then switches to B-flow for high-precision refinement. This combines the global search of loss-flow with the precision of B-flow.

Conclusion

Pass — B-flow achieves exact equilibrium in 1D and \( 4.7 \times 10^{-34} \) in 2D, vastly outperforming loss-flow. The two-phase strategy handles non-convex landscapes. Theorem 14 is validated.

Reproducibility

../simplex/build/sxc exp_balance_residual.sx -o build/exp_balance_residual.ll
clang -O2 build/exp_balance_residual.ll ../simplex/runtime/standalone_runtime.c \
  -o build/exp_balance_residual -lm -lssl -lcrypto -L$(brew --prefix openssl)/lib
./build/exp_balance_residual

Related