Continual Learning via Local Module Composition
Authors: Oleksiy Ostapenko, Pau Rodriguez, Massimo Caccia, Laurent Charlin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the first set of experiments, LMC performs favorably compared to existing methods on the recent Continual Transfer-learning Benchmark without requiring task identities. In another study, we show that the locality of structural learning allows LMC to interpolate to related but unseen tasks (OOD), as well as to compose modular networks trained independently on different task sequences into a third modular network without any fine-tuning. |
| Researcher Affiliation | Collaboration | 1Mila Quebec AI Institute, 2Université de Montréal, 3Service Now, 4HEC Montréal, 5Canada CIFAR AI Chair |
| Pseudocode | No | The paper describes the model architecture and training process in detail using prose and mathematical formulas, but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codebase is available under https://github.com/oleksost/LMC. |
| Open Datasets | Yes | We use a colored-MNIST dataset a variation of the standard MNIST dataset... We designed the following tasks: f MNIST+ and f MNIST-. Both are sampled from the fashion-MNIST dataset [98]... We trained LMC1 continually on MNIST, f MNIST+, and c MNIST-r tasks. We trained LMC2 on f MNIST-, c MNIST-g, and SVHN. |
| Dataset Splits | Yes | We design a simple sequence of tasks as follows. First, we define two high-level features: the foreground-background color combination (using the colors red, black, green, blue) and the class (0 9). Then, we create five non-overlapping tasks of two (digit) classes each: {0-1, ..., 8-9}. At training time the model is continually trained using a sequence of these tasks, however, each task is only seen in one of five different foreground-background combinations {red-black, green-black, blue-black, black-red, black-green}. At test time we measure the generalization ability to seen and unseen combinations of classes and colors. |
| Hardware Specification | No | The paper mentions 'Mila and Compute Canada for providing computational resources' but does not specify any particular hardware components such as CPU, GPU models, or memory. |
| Software Dependencies | No | The paper mentions optimizers like 'Adam [44]' and 'SGD' but does not specify software versions for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | In our experiments, unless otherwise stated, a module consists of a single convolutional layer followed by batch-norm, Re LU activation, and a max-pooling operation. ... A new module is added to a layer when all modules in this layer detect an outlier input. To this end, we track the running statistics of the relatedness score γ for each module mean µ and variance σ (see Figure 1), and calculate a z-score for each sample in the batch and each module at a layer: z (l) = γ (l) m σ (l) . An input is considered an outlier if its z-score is larger than a predefined threshold z (see Appendix B.6 for an ablation study of z values). Unless stated otherwise, in our experiments, the decision was made on a per-batch basis. |