Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Gradient-based Editing of Memory Examples for Online Task-free Continual Learning
Authors: Xisen Jin, Arka Sadhu, Junyi Du, Xiang Ren
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments validate the effectiveness of GMED, and our best method significantly outperforms baselines and previous state-of-the-art on five out of six datasets. |
| Researcher Affiliation | Academia | Xisen Jin Arka Sadhu Junyi Du Xiang Ren University of Southern California {xisenjin, asadhu, junyidu, EMAIL} |
| Pseudocode | Yes | Algorithm 1: Gradient Memory EDiting with ER (ER+GMED) |
| Open Source Code | Yes | 1Code can be found at https://github.com/INK-USC/GMED. |
| Open Datasets | Yes | We use six public CL datasets in our experiments. Split / Permuted / Rotated MNIST are constructed from the MNIST [21] dataset which contains images of handwritten digits. We also employ Split CIFAR-10 and Split CIFAR-100, which comprise of 5 and 20 disjoint subsets respectively based on their class labels. Similarly, Split mini-Image Net [2] splits the mini-Image Net [10, 42] dataset into 20 disjoint subsets based on their labels. |
| Dataset Splits | No | For all MNIST experiments, each task consists of 1,000 training examples following [2]. We set the size of replay memory as 10K for split CIFAR-100 and split mini-Image Net, and 500 for all remaining datasets. |
| Hardware Specification | No | Computational Efficiency. We analyze the additional forward and backward computation required by ER+GMED and MIR. Compared to ER, ER+GMED adds 3 forward and 1 backward passes to estimate loss increase, and 1 backward pass to update the example. In comparison, MIR adds 3 forward and 1 backward passes with 2 of the forward passes are over a larger set of retrieval candidates. In our experiments, we found GMED has similar training time cost as MIR. In Appendix B, we report the wall-clock time, and observe the run-time of ER+GMED is 1.5 times of ER. |
| Software Dependencies | No | For model architectures, we mostly follow the setup of [2]: for the three MNIST datasets, we use a MLP classifier with 2 hidden layers with 400 hidden units each. For Split CIFAR-10, Split CIFAR-100 and Split mini-Image Net datasets, we use a Res Net-18 classifier with three times less feature maps across all layers. |
| Experiment Setup | Yes | We set the size of replay memory as 10K for split CIFAR-100 and split mini-Image Net, and 500 for all remaining datasets. Following [8], we tune the hyper-parameters α (editing stride) and β (regularization strength) with only the first three tasks. While γ (decay rate of the editing stride) is a hyper-parameter that may flexibly control the deviation of edited examples from their original states, we find γ=1.0 (i.e., no decay) leads to better performance in our experiments. |