How I Learned to Stop Worrying and Love Retraining

Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically find that the results of Li et al. (2020) regarding the Budgeted Training of Neural Networks apply to the retraining phase of IMP, providing further context for the results of Renda et al. (2020) and Le & Hua (2021). Building on this, we find that the runtime of IMP can be drastically shortened by using a simple linear learning rate schedule with little to no degradation in model performance. We perform extensive experiments on image recognition datasets such as Image Net (Russakovsky et al., 2015), CIFAR-10/100 (Krizhevsky et al., 2009), the semantic segmentation tasks COCO (Lin et al., 2014) and City Scapes (Cordts et al., 2016) as well as neural machine translation (NMT) on WMT16 (Bojar et al., 2016). In particular, we employed Res Nets (He et al., 2015), Wide Res Nets (WRN) (Zagoruyko & Komodakis, 2016), VGG (Simonyan & Zisserman, 2014), the transformerbased Max Vi T (Tu et al., 2022) architecture, as well as PSPNet (Zhao et al., 2017) and Deep Lab V3 (Chen et al., 2017) in the case of City Scapes and COCO, respectively.
Researcher Affiliation Academia Max Zimmer1, Christoph Spiegel1 & Sebastian Pokutta1,2 1Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Germany 2Institute of Mathematics, Technische Universität Berlin, Germany {zimmer,spiegel,pokutta}@zib.de
Pseudocode No The paper describes algorithms and methods verbally and through conceptual figures, but it does not include any formal pseudocode blocks or algorithm listings.
Open Source Code Yes We have made our code and general setup available at github.com/ZIB-IOL/BIMP for the sake of reproducibility.
Open Datasets Yes We perform extensive experiments on image recognition datasets such as Image Net (Russakovsky et al., 2015), CIFAR-10/100 (Krizhevsky et al., 2009), the semantic segmentation tasks COCO (Lin et al., 2014) and City Scapes (Cordts et al., 2016) as well as neural machine translation (NMT) on WMT16 (Bojar et al., 2016).
Dataset Splits Yes We use a validation set of 10% of the training data for hyperparameter selection.
Hardware Specification No The paper does not explicitly state the specific hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper mentions software like 'Py Torch framework', 'Hugging Face', 'Weights & Biases', but it does not specify version numbers for these dependencies, which is required for reproducibility.
Experiment Setup Yes Table 3: Exact training configurations used throughout the experiments for IMP. We note that others have reported an accuracy of around 80% for WRN28x10 trained on CIFAR-100 that we were unable to replicate. The discrepancy is most likely due to an inconsistency in Py Torch s dropout implementation. For experiments involving Vision-Transformers, we used label smoothing as well as gradient clipping. For COCO and City Scapes architectures, we rely on pretrained backbones and report the common mean Intersection-over-Union (Io U) metric measured on the validation set. For the NMT task we report the BLEU score on the test set, where we limit the sequence length to 128 throughout.