reproducibilityindex.ai

Fortuitous Forgetting in Connectionist Networks

Authors: Hattie Zhou, Ankit Vani, Hugo Larochelle, Aaron Courville

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the power of this perspective by showing that we can significantly improve upon existing work in their respective settings through the use of more targeted forgetting operations. Our analysis in Section 5 sheds insight on the mechanism through which iterative training leads to parameter values with better generalization properties.
Researcher Affiliation	Collaboration	Hattie Zhou Mila, Universit e de Montr eal Ankit Vani Mila, Universit e de Montr eal Hugo Larochelle Mila, CIFAR Fellow, Google Research, Brain Team Aaron Courville Mila, Universit e de Montr eal, CIFAR Fellow
Pseudocode	No	The paper describes methods and processes in detail, including mathematical formulations and textual descriptions of algorithms like IMP or KE, but it does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We make our code available at https://github.com/hlml/fortuitous_forgetting.
Open Datasets	Yes	Flower (Nilsback & Zisserman, 2008) 102 1020 1020 6149 8189 CUB (Wah et al., 2011) 200 5994 N/A 5794 11788 Aircraft (Maji et al., 2013) 100 3334 3333 3333 10000 MIT67 (Quattoni & Torralba, 2009) 67 5360 N/A 1340 6700 Stanford-Dogs (Khosla et al., 2011) 120 12000 N/A 8580 20580
Dataset Splits	Yes	Table A6: Summary of the ﬁve datasets used in Section 4.1, adopted from Taha et al. (2021). Classes Train Size Val Size Test Size Total Size Flower (Nilsback & Zisserman, 2008) 102 1020 1020 6149 8189
Hardware Specification	No	The paper provides details on the training process, such as optimizer, momentum, weight decay, learning rate schedule, batch size, and number of epochs ('All networks are trained with stochastic gradient descent (SGD) with momentum of 0.9 and weight decay of 10-4. We also use a cosine learning rate schedule (Loshchilov & Hutter, 2017). ... Models are trained with a batch size of 32 for 200 epochs each generation.'), but it does not specify any hardware components like GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer (Kingma & Ba, 2014)' and implicitly uses deep learning frameworks (likely PyTorch, given the community and affiliations), but it does not specify any version numbers for these software dependencies (e.g., 'PyTorch 1.x', 'Python 3.x').
Experiment Setup	Yes	All networks are trained with stochastic gradient descent (SGD) with momentum of 0.9 and weight decay of 10 4. We also use a cosine learning rate schedule (Loshchilov & Hutter, 2017). Taha et al. (2021) use an initial learning rate of 0.256, but we ﬁnd that a smaller learning rate than what is used in Taha et al. (2021) to be beneﬁcial for certain datasets, thus we consider a learning rate in {0.1, 0.256} for all experiments and report the setting with the better validation performance. Models are trained with a batch size of 32 for 200 epochs each generation.