Information-theoretic Online Memory Selection for Continual Learning
Authors: Shengyang Sun, Daniele Calandriello, Huiyi Hu, Ang Li, Michalis Titsias
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that the proposed information-theoretic criteria encourage to select representative memories for learning the underlying function. We also conduct standard continual learning benchmarks and demonstrate the advantage of our proposed reservoir sampler over strong GCL baselines at various levels of data imbalance. |
| Researcher Affiliation | Collaboration | Shengyang Sun 1, Daniele Calandriello4, Huiyi Hu 2, Ang Li 3, Michalis K. Titsias4 1University of Toronto, 1Vector Institute, 2Google Brain, 3Baidu Apollo, 4Deep Mind |
| Pseudocode | Yes | Algorithm 1 Information-theoretic Reservoir Sampling (Info RS)... Algorithm 2 Information-theoretic Greedy Selection (Info GS)... Algorithm 3 Reservoir Sampling (Vitter, 1985)... Algorithm 4 Weighted Reservoir Sampling (Chao, 1982; Efraimidis & Spirakis, 2006)... Algorithm 5 Class-Balanced Reservoir Sampling (Chrysakis & Moens, 2020) |
| Open Source Code | No | The paper includes a Reproducibility Statement that details pseudocodes and hyper-parameters, but it does not provide any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | The benchmarks involve Permuted MNIST, Split MNIST, Split CIFAR10, and Split Mini Image Net. |
| Dataset Splits | Yes | To tune the hyper-parameters, we pick 10% of training data as the validation set, then we pick the best hyper-parameter based on the averaged validation accuracy over 5 random seeds. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models or configurations) used for running the experiments, only general model architectures are mentioned. |
| Software Dependencies | No | The paper mentions software components like 'stochastic gradient descent optimizer' and refers to a model implementation (DER++), but it does not provide specific version numbers for any software, libraries, or frameworks used (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1'). |
| Experiment Setup | Yes | To tune the hyper-parameters, we pick 10% of training data as the validation set, then we pick the best hyper-parameter based on the averaged validation accuracy over 5 random seeds. The tuning hyper-parameters include the learning rate lr, the logit regularization coefficient α, the target regularization coefficient β, the learnability ratio η, and the information thresholding ratio γi, if needed. We present the detailed hyper-parameters in Table 1. |