reproducibilityindex.ai

Information-theoretic Online Memory Selection for Continual Learning

Authors: Shengyang Sun, Daniele Calandriello, Huiyi Hu, Ang Li, Michalis Titsias

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that the proposed information-theoretic criteria encourage to select representative memories for learning the underlying function. We also conduct standard continual learning benchmarks and demonstrate the advantage of our proposed reservoir sampler over strong GCL baselines at various levels of data imbalance.
Researcher Affiliation	Collaboration	Shengyang Sun 1, Daniele Calandriello4, Huiyi Hu 2, Ang Li 3, Michalis K. Titsias4 1University of Toronto, 1Vector Institute, 2Google Brain, 3Baidu Apollo, 4Deep Mind
Pseudocode	Yes	Algorithm 1 Information-theoretic Reservoir Sampling (Info RS)... Algorithm 2 Information-theoretic Greedy Selection (Info GS)... Algorithm 3 Reservoir Sampling (Vitter, 1985)... Algorithm 4 Weighted Reservoir Sampling (Chao, 1982; Efraimidis & Spirakis, 2006)... Algorithm 5 Class-Balanced Reservoir Sampling (Chrysakis & Moens, 2020)
Open Source Code	No	The paper includes a Reproducibility Statement that details pseudocodes and hyper-parameters, but it does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	The benchmarks involve Permuted MNIST, Split MNIST, Split CIFAR10, and Split Mini Image Net.
Dataset Splits	Yes	To tune the hyper-parameters, we pick 10% of training data as the validation set, then we pick the best hyper-parameter based on the averaged validation accuracy over 5 random seeds.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models or configurations) used for running the experiments, only general model architectures are mentioned.
Software Dependencies	No	The paper mentions software components like 'stochastic gradient descent optimizer' and refers to a model implementation (DER++), but it does not provide specific version numbers for any software, libraries, or frameworks used (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1').
Experiment Setup	Yes	To tune the hyper-parameters, we pick 10% of training data as the validation set, then we pick the best hyper-parameter based on the averaged validation accuracy over 5 random seeds. The tuning hyper-parameters include the learning rate lr, the logit regularization coefﬁcient α, the target regularization coefﬁcient β, the learnability ratio η, and the information thresholding ratio γi, if needed. We present the detailed hyper-parameters in Table 1.