reproducibilityindex.ai

How I Learned to Stop Worrying and Love Retraining

Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically ﬁnd that the results of Li et al. (2020) regarding the Budgeted Training of Neural Networks apply to the retraining phase of IMP, providing further context for the results of Renda et al. (2020) and Le & Hua (2021). Building on this, we ﬁnd that the runtime of IMP can be drastically shortened by using a simple linear learning rate schedule with little to no degradation in model performance. We perform extensive experiments on image recognition datasets such as Image Net (Russakovsky et al., 2015), CIFAR-10/100 (Krizhevsky et al., 2009), the semantic segmentation tasks COCO (Lin et al., 2014) and City Scapes (Cordts et al., 2016) as well as neural machine translation (NMT) on WMT16 (Bojar et al., 2016). In particular, we employed Res Nets (He et al., 2015), Wide Res Nets (WRN) (Zagoruyko & Komodakis, 2016), VGG (Simonyan & Zisserman, 2014), the transformerbased Max Vi T (Tu et al., 2022) architecture, as well as PSPNet (Zhao et al., 2017) and Deep Lab V3 (Chen et al., 2017) in the case of City Scapes and COCO, respectively.
Researcher Affiliation	Academia	Max Zimmer1, Christoph Spiegel1 & Sebastian Pokutta1,2 1Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Germany 2Institute of Mathematics, Technische Universität Berlin, Germany {zimmer,spiegel,pokutta}@zib.de
Pseudocode	No	The paper describes algorithms and methods verbally and through conceptual figures, but it does not include any formal pseudocode blocks or algorithm listings.
Open Source Code	Yes	We have made our code and general setup available at github.com/ZIB-IOL/BIMP for the sake of reproducibility.
Open Datasets	Yes	We perform extensive experiments on image recognition datasets such as Image Net (Russakovsky et al., 2015), CIFAR-10/100 (Krizhevsky et al., 2009), the semantic segmentation tasks COCO (Lin et al., 2014) and City Scapes (Cordts et al., 2016) as well as neural machine translation (NMT) on WMT16 (Bojar et al., 2016).
Dataset Splits	Yes	We use a validation set of 10% of the training data for hyperparameter selection.
Hardware Specification	No	The paper does not explicitly state the specific hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies	No	The paper mentions software like 'Py Torch framework', 'Hugging Face', 'Weights & Biases', but it does not specify version numbers for these dependencies, which is required for reproducibility.
Experiment Setup	Yes	Table 3: Exact training conﬁgurations used throughout the experiments for IMP. We note that others have reported an accuracy of around 80% for WRN28x10 trained on CIFAR-100 that we were unable to replicate. The discrepancy is most likely due to an inconsistency in Py Torch s dropout implementation. For experiments involving Vision-Transformers, we used label smoothing as well as gradient clipping. For COCO and City Scapes architectures, we rely on pretrained backbones and report the common mean Intersection-over-Union (Io U) metric measured on the validation set. For the NMT task we report the BLEU score on the test set, where we limit the sequence length to 128 throughout.