Disentangling and mitigating the impact of task similarity for continual learning

Authors: Naoki Hiratani

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate consistent results in a permuted MNIST task with latent variables. Overall, this work provides insights into when continual learning is difficult and how to mitigate it. [...] Furthermore, we test our key predictions numerically in a permuted MNIST task with a latent structure.
Researcher Affiliation Academia Naoki Hiratani Department of Neuroscience Washington University in St Louis St Louis, MO 63110 hiratani@wustl.edu
Pseudocode No The paper describes methods mathematically and textually but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Source codes for all numerical results are made publicly available at https://github.com/nhiratani/transfer_ retention_model.
Open Datasets Yes We used permuted MNIST dataset [34, 22], a common benchmark for continual learning, but with addition of the latent space.
Dataset Splits No The paper describes training and testing procedures and parameters like epochs and learning rates, but it does not explicitly mention the use of a separate validation dataset or how a validation split was performed.
Hardware Specification No Numerical experiments were conducted in standard laboratory GPUs and CPUs. The paper does not provide specific models or detailed specifications for the hardware used.
Software Dependencies No The paper states that source code is available but does not explicitly list specific software dependencies or their version numbers within the text.
Experiment Setup Yes We set the latent variable dimensionality Ns = 30, the input width Nx = 3000, and the output width Ny = 10. The student weight W was initialized as the zero matrix, and updated with the full gradient descent with learning rate η = 0.001. [...] We set the hidden layer width to Nh = 1500. The input and output widths were set to Nx = 784 and Ny = 10. [...] We set the mini-batch size to 300 and the learning rate to η = 0.01, and trained the network for 100 epochs per task.