reproducibilityindex.ai

An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks

Authors: Unknown

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the eﬀect of the relationship between the ﬁrst task and the second task on catastrophic forgetting.
Researcher Affiliation	Academia	Ian J. Goodfellow goodfeli@iro.umontreal.ca Mehdi Mirza mirzamom@iro.umontreal.ca Da Xiao xiaoda99@bupt.edu.cn Aaron Courville aaron.courville@umontreal.ca Yoshua Bengio yoshua.bengio@umontreal.ca
Pseudocode	No	The paper describes algorithms in text, but no structured pseudocode blocks or algorithm listings are present.
Open Source Code	Yes	Code associated with this paper is available at https:// github.com/goodfeli/forgetting.
Open Datasets	Yes	Specifically, we used MNIST classiﬁcation, but with a different permutation of the pixels for the old task and the new task. To test this case, we used sentiment analysis of two product categories of Amazon reviews (Blitzer et al., 2007) as the two tasks.
Dataset Splits	No	The paper mentions using 'validation set' to stop training and refers to 'MNIST validation set' and 'Amazon validation set' with some details about subsampling, but does not provide explicit split percentages, sample counts for all datasets, or clear citations to predefined splits for reproduction.
Hardware Specification	No	The paper mentions 'computational resources' from NSERC, Compute Canada, and Calcul Quebec in the acknowledgments, but does not specify any particular hardware components like CPU or GPU models used for the experiments.
Software Dependencies	No	The paper acknowledges the use of 'Theano' and 'Pylearn2', citing their respective papers, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For each of these eight conditions, we randomly generate 25 random sets of hyperparameters. ... The hyperparameters we search over include the magnitude of the maxnorm constraint (Srebro & Shraibman, 2005) for each layer, the method used to initialize the weights for each layer and any hyper-parameters associated with such method, the initial biases for each layer, the parameters controlling a saturating linear learning rate decay and momentum increase schedule, and the size of each layer. For dropout, the best probability of dropping a hidden unit is known to usually be around 0.5, and the best probability of dropping a visible unit is known to usually be around 0.2. In all cases, we ﬁrst train on the old task until the validation set error has not improved in the last 100 epochs. Then we restore the parameters corresponding to the best validation set error, and begin training on the new task . We train until the error on the union of the old validation set and new validation set has not improved for 100 epochs.