Disentangling and mitigating the impact of task similarity for continual learning
Authors: Naoki Hiratani
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate consistent results in a permuted MNIST task with latent variables. Overall, this work provides insights into when continual learning is difficult and how to mitigate it. [...] Furthermore, we test our key predictions numerically in a permuted MNIST task with a latent structure. |
| Researcher Affiliation | Academia | Naoki Hiratani Department of Neuroscience Washington University in St Louis St Louis, MO 63110 hiratani@wustl.edu |
| Pseudocode | No | The paper describes methods mathematically and textually but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source codes for all numerical results are made publicly available at https://github.com/nhiratani/transfer_ retention_model. |
| Open Datasets | Yes | We used permuted MNIST dataset [34, 22], a common benchmark for continual learning, but with addition of the latent space. |
| Dataset Splits | No | The paper describes training and testing procedures and parameters like epochs and learning rates, but it does not explicitly mention the use of a separate validation dataset or how a validation split was performed. |
| Hardware Specification | No | Numerical experiments were conducted in standard laboratory GPUs and CPUs. The paper does not provide specific models or detailed specifications for the hardware used. |
| Software Dependencies | No | The paper states that source code is available but does not explicitly list specific software dependencies or their version numbers within the text. |
| Experiment Setup | Yes | We set the latent variable dimensionality Ns = 30, the input width Nx = 3000, and the output width Ny = 10. The student weight W was initialized as the zero matrix, and updated with the full gradient descent with learning rate η = 0.001. [...] We set the hidden layer width to Nh = 1500. The input and output widths were set to Nx = 784 and Ny = 10. [...] We set the mini-batch size to 300 and the learning rate to η = 0.01, and trained the network for 100 epochs per task. |