From Teacher Pathways to Invariant Manifolds: Consensus Subspace Distillation for TSFMs
第一作者 First Author摘要 Abstract 自动补全 Auto-filled
Time-series foundation models (TSFMs) deliver strong cross-domain generalization, but their scale makes deployment costly. Knowledge distillation is a natural compression route, yet prior TSFM distillation typically imitates teacher outputs, features, or pairwise relations, and therefore remains tightly coupled to teacher-specific training trajectories while underutilizing two empirical properties: (i) high-level representations across model scales tend to converge toward a shared, approximately low-rank geometry, and (ii) layer-wise utility follows a long-tail pattern. We propose consensus subspace distillation, which reframes distillation as aligning a student to a model-agnostic geometric object: a scale-invariant low-rank consensus subspace together with its center statistics. Offline, we screen high-contribution layers via drop-layer marginal loss, estimate a shrinkage-stabilized covariance from their embeddings, and derive a truncated eigensubspace that defines a consensus projector. Online, we project student embeddings into this subspace and match the teacher’s projected mean and covariance using a lightweight mean--covariance objective, enabling stable optimization without rigid pointwise feature binding. To mitigate subset-induced bias, we further introduce a frequency-domain uncertainty injection mechanism that inflates spectral density based on characteristic-function discrepancies and injects dispersion only within the consensus directions. Across forecasting and imputation, the distilled student matches or slightly improves upon the teacher, while exhibiting a predictable trade-off under strict zero-shot classification. With MOMENT-Large as teacher, we achieve about 90% parameter reduction and substantial distillation-time savings while retaining comparable performance across multiple time-series tasks. Code and compressed weights are available at anonymous.4open.science/r/CSD-13C3/.