Using Hindsight to Anchor Past Knowledge in Continual Learning
Authors: Arslan Chaudhry, Albert Gordo, Puneet Dokania, Philip Torr, David Lopez-Paz6993-7001
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several supervised learning benchmarks for continual learning demonstrate that our approach improves the standard experience replay in terms of both accuracy and forgetting metrics and for various sizes of episodic memory. |
| Researcher Affiliation | Collaboration | 1University of Oxford, 2Facebook AI |
| Pseudocode | Yes | Hindsight Anchor Learning (HAL) Initialize θ P(θ) and {et P(e)}T t=1 from normal distributions P(θ) and P(e), M = {}. Initialize M = {} For each task t = 1, . . . , T: For each minibatch B from task t: * Sample BM from M * Update θ using Eq. 5 * Update φt using Eq. 8 * Update M by adding B in a FIFO ring buffer Fine-tune on M to obtain θM Build et using Eq. 9 k times Discard φt Return θ. |
| Open Source Code | Yes | All the other baselines use our unified code base which is available here 2. 2https://github.com/arslan-chaudhry/Hindsight Anchor |
| Open Datasets | Yes | Permuted MNIST is a variant of the MNIST dataset of handwritten digits (Le Cun 1998) ... Split CIFAR is a variant of the CIFAR100 dataset (Krizhevsky and Hinton 2009; Zenke, Poole, and Ganguli 2017) ... Split mini Image Net is a variant of the Image Net dataset (Russakovsky et al. 2015; Vinyals et al. 2016) |
| Dataset Splits | Yes | For all datasets, the first 3 tasks are used for hyperparameter optimization (grids available in Appendix ). The learner can perform multiple epochs on these three initial tasks that are later discarded for evaluation. |
| Hardware Specification | No | The paper describes neural network architectures used (e.g., 'perceptron with two hidden layers', 'Res Net18') but does not specify any hardware details like GPU models (e.g., NVIDIA V100, RTX 3090) or CPU types. |
| Software Dependencies | No | The paper mentions running official implementations and a unified codebase, with links to GitHub repositories, but does not specify software versions for libraries such as Python, PyTorch, or TensorFlow, which are essential for full reproducibility. |
| Experiment Setup | Yes | Batch size is set to 10 for both the stream of data and episodic memory across experiments and models. The size of episodic memory is set between 1 and 5 examples per class per task. |