Nearly Optimal Bounds for Cyclic Forgetting
Authors: William Swartworth, Deanna Needell, Rachel Ward, Mark Kong, Halyun Jeong
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide theoretical bounds on the forgetting quantity in the continual learning setting for linear tasks, where each round of learning corresponds to projecting onto a linear subspace. For a cyclic task ordering on T tasks repeated m times each, we prove the best known upper bound of O(T 2/m) on the forgetting. Notably, our bound holds uniformly over all choices of tasks and is independent of the ambient dimension. Our main technical contribution is a characterization of the union of all numerical ranges of products of T (real or complex) projections as a sinusoidal spiral, which may be of independent interest. |
| Researcher Affiliation | Academia | Halyun Jeong University of California Los Angeles hajeong@math.ucla.edu Mark Kong University of California Los Angeles markkong@ucla.edu Deanna Needell University of California Los Angeles deanna@math.ucla.edu William Swartworth Carnegie Mellon University wswartwo@andrew.cmu.edu Rachel Ward University of Texas at Austin rward@math.utexas.edu |
| Pseudocode | No | The paper describes algorithms (e.g., Kaczmarz update) in prose but does not provide them in structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper works in a theoretical setting with abstract "tasks" and "data points"; it does not use or refer to any specific publicly available datasets for training. |
| Dataset Splits | No | This is a theoretical paper and does not conduct empirical experiments, thus it does not mention training, validation, or test dataset splits. |
| Hardware Specification | No | This is a theoretical paper and does not describe any computational experiments, thus it does not mention hardware specifications. |
| Software Dependencies | No | This is a theoretical paper and does not describe any computational experiments, thus it does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe any experimental setup with hyperparameters or training settings. |