← All Conjectures

Conjecture 6.7: Transfer Learning Threshold

VALIDATED There exists a sharp crossover point where transferring learned structure between tasks becomes harmful.

Statement

For two tasks \( A \) and \( B \) parameterised by distributions \( p_A \) and \( p_B \), there exists a critical distance \( \delta^* \) such that transfer learning improves performance when \( |p_A - p_B| < \delta^* \) and degrades performance when \( |p_A - p_B| > \delta^* \).

Status: Validated

The crossover occurs at \( |p_A - p_B| \approx 0.15 \). Below this threshold, sharing interaction matrix structure between tasks accelerates convergence. Above it, the transferred structure introduces systematic bias that slows or prevents convergence.

Evidence Summary

The experiment sweeps task distance from 0.0 to 0.5 in steps of 0.01 and measures convergence time with and without transfer:

  • \( |p_A - p_B| = 0.05 \): transfer saves 40% of convergence time
  • \( |p_A - p_B| = 0.10 \): transfer saves 25%
  • \( |p_A - p_B| = 0.15 \): crossover point — transfer has no net effect
  • \( |p_A - p_B| = 0.20 \): transfer adds 15% to convergence time
  • \( |p_A - p_B| = 0.30 \): transfer adds 45% — significant negative transfer

The crossover is clean and reproducible. The mechanism is that the interaction matrix learned from task \( A \) encodes gradient relationships specific to \( p_A \); when these relationships differ sufficiently from \( p_B \), the transferred structure creates gradient conflicts that the cosine-scaled projection must resolve, adding overhead.

Relevant Experiments

  • exp_memory_dynamics.sx — transfer dynamics across task distributions
  • exp_sensitivity.sx — robustness of the crossover point to hyperparameters

What This Means

This result provides a practical decision rule: measure the distribution distance between source and target tasks, and only transfer if the distance is below ~0.15. The threshold is not a universal constant (it depends on system dimensionality and contraction rates), but the existence of a sharp crossover is robust. This formalises the well-known practitioner intuition about "negative transfer" and gives it a quantitative basis within the adaptation framework.