An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks
Authors: Unknown
ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting. |
| Researcher Affiliation | Academia | Ian J. Goodfellow goodfeli@iro.umontreal.ca Mehdi Mirza mirzamom@iro.umontreal.ca Da Xiao xiaoda99@bupt.edu.cn Aaron Courville aaron.courville@umontreal.ca Yoshua Bengio yoshua.bengio@umontreal.ca |
| Pseudocode | No | The paper describes algorithms in text, but no structured pseudocode blocks or algorithm listings are present. |
| Open Source Code | Yes | Code associated with this paper is available at https:// github.com/goodfeli/forgetting. |
| Open Datasets | Yes | Specifically, we used MNIST classification, but with a different permutation of the pixels for the old task and the new task. To test this case, we used sentiment analysis of two product categories of Amazon reviews (Blitzer et al., 2007) as the two tasks. |
| Dataset Splits | No | The paper mentions using 'validation set' to stop training and refers to 'MNIST validation set' and 'Amazon validation set' with some details about subsampling, but does not provide explicit split percentages, sample counts for all datasets, or clear citations to predefined splits for reproduction. |
| Hardware Specification | No | The paper mentions 'computational resources' from NSERC, Compute Canada, and Calcul Quebec in the acknowledgments, but does not specify any particular hardware components like CPU or GPU models used for the experiments. |
| Software Dependencies | No | The paper acknowledges the use of 'Theano' and 'Pylearn2', citing their respective papers, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For each of these eight conditions, we randomly generate 25 random sets of hyperparameters. ... The hyperparameters we search over include the magnitude of the maxnorm constraint (Srebro & Shraibman, 2005) for each layer, the method used to initialize the weights for each layer and any hyper-parameters associated with such method, the initial biases for each layer, the parameters controlling a saturating linear learning rate decay and momentum increase schedule, and the size of each layer. For dropout, the best probability of dropping a hidden unit is known to usually be around 0.5, and the best probability of dropping a visible unit is known to usually be around 0.2. In all cases, we first train on the old task until the validation set error has not improved in the last 100 epochs. Then we restore the parameters corresponding to the best validation set error, and begin training on the new task . We train until the error on the union of the old validation set and new validation set has not improved for 100 epochs. |