Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
Authors: Yanlai Yang, Matt Jones, Michael C. Mozer, Mengye Ren
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Through comprehensive experiments and visualizations, we demonstrate a new mechanism by which over-parametrized neural networks can recover from catastrophic interference and uncover new insights into training over-parameterized networks in cyclically structured environments. |
| Researcher Affiliation | Collaboration | Yanlai Yang1, Matt Jones2, Michael C. Mozer3,2, and Mengye Ren1 1New York University, 2University of Colorado, Boulder, 3Google DeepMind |
| Pseudocode | No | Just as when training a deep net, we assume here that representation learning occurs slowly, and that one training step for task i involves a single gradient update of P with step size "alpha": P P "alpha"(P xi fi(w))x i . (1) In contrast, at each training step, w, analogous to the fast-adapting weights in a neural network, can be rapidly tuned to solve for task i, yielding the loss minimizer conditional on P : w f 1 i (P xi). (2) |
| Open Source Code | Yes | We provide the code and instructions for reproducing main experimental results in the supplementary material. |
| Open Datasets | Yes | For the LLM experiments, we use the CNN/Daily Mail news dataset [17]. For the vision experiments, we use images sampled from CIFAR-10 [18] and ImageNet [19]. |
| Dataset Splits | No | We use the same documents for both training and evaluation. Our goal here is not to determine whether a trained model generalizes to new documents, but rather to study the memory for a particular document as a function of position within the training history. |
| Hardware Specification | Yes | Each experiment presented in the paper is run with one NVIDIA A100 GPU, 2 CPUs, and 32GB of RAM. |
| Software Dependencies | No | We use the Huggingface Transformers Library [63] for fine-tuning the LLMs. |
| Experiment Setup | Yes | Unless otherwise stated, the default hyperparameters in the subsequent experiments are T = 25, C = 256, M = 10, E = 5. We use the average cross entropy loss (average negative log-likelihood for each token) as our training and evaluation metric. The learning rate 0.001 for vanilla gradient descent and 0.00001 for Adam. |