reproducibilityindex.ai

Continual Learning via Local Module Composition

Authors: Oleksiy Ostapenko, Pau Rodriguez, Massimo Caccia, Laurent Charlin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the ﬁrst set of experiments, LMC performs favorably compared to existing methods on the recent Continual Transfer-learning Benchmark without requiring task identities. In another study, we show that the locality of structural learning allows LMC to interpolate to related but unseen tasks (OOD), as well as to compose modular networks trained independently on different task sequences into a third modular network without any ﬁne-tuning.
Researcher Affiliation	Collaboration	1Mila Quebec AI Institute, 2Université de Montréal, 3Service Now, 4HEC Montréal, 5Canada CIFAR AI Chair
Pseudocode	No	The paper describes the model architecture and training process in detail using prose and mathematical formulas, but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The codebase is available under https://github.com/oleksost/LMC.
Open Datasets	Yes	We use a colored-MNIST dataset a variation of the standard MNIST dataset... We designed the following tasks: f MNIST+ and f MNIST-. Both are sampled from the fashion-MNIST dataset [98]... We trained LMC1 continually on MNIST, f MNIST+, and c MNIST-r tasks. We trained LMC2 on f MNIST-, c MNIST-g, and SVHN.
Dataset Splits	Yes	We design a simple sequence of tasks as follows. First, we deﬁne two high-level features: the foreground-background color combination (using the colors red, black, green, blue) and the class (0 9). Then, we create ﬁve non-overlapping tasks of two (digit) classes each: {0-1, ..., 8-9}. At training time the model is continually trained using a sequence of these tasks, however, each task is only seen in one of ﬁve different foreground-background combinations {red-black, green-black, blue-black, black-red, black-green}. At test time we measure the generalization ability to seen and unseen combinations of classes and colors.
Hardware Specification	No	The paper mentions 'Mila and Compute Canada for providing computational resources' but does not specify any particular hardware components such as CPU, GPU models, or memory.
Software Dependencies	No	The paper mentions optimizers like 'Adam [44]' and 'SGD' but does not specify software versions for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	In our experiments, unless otherwise stated, a module consists of a single convolutional layer followed by batch-norm, Re LU activation, and a max-pooling operation. ... A new module is added to a layer when all modules in this layer detect an outlier input. To this end, we track the running statistics of the relatedness score γ for each module mean µ and variance σ (see Figure 1), and calculate a z-score for each sample in the batch and each module at a layer: z (l) = γ (l) m σ (l) . An input is considered an outlier if its z-score is larger than a predeﬁned threshold z (see Appendix B.6 for an ablation study of z values). Unless stated otherwise, in our experiments, the decision was made on a per-batch basis.