reproducibilityindex.ai

Rethinking the Hyperparameters for Fine-tuning

Authors: Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our ﬁndings are based on extensive empirical evaluation for ﬁne-tuning on various transfer learning benchmarks.
Researcher Affiliation	Collaboration	Hao Li1, Pratik Chaudhari2 , Hao Yang1, Michael Lam1, Avinash Ravichandran1, Rahul Bhotika1, Stefano Soatto1,3 1Amazon Web Services, 2University of Pennsylvania, 3University of California, Los Angeles
Pseudocode	No	The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present any structured code blocks within the text.
Open Source Code	No	The paper refers to 'the code5 provided by (Li et al., 2018)' (footnote 5 links to 'https://github.com/holyseven/Transfer Learning Classification') for comparison experiments in Section 3.4, but does not state that the code for its own methodology or experimental setup is being released or made available.
Open Datasets	Yes	We evaluate ﬁne-tuning on seven widely used image classiﬁcation datasets, which covers tasks for ﬁne-grained object recognition, scene recognition and general object recognition. Detailed statistics of each dataset can be seen in Table 1. We use Image Net (Russakovsky et al., 2015), Places-365 (Zhou et al., 2018) and i Naturalist (Van Horn et al., 2018) as source domains for pre-trained models.
Dataset Splits	Yes	We report the Top-1 validation (test) error at the end of training.
Hardware Specification	Yes	For each training job with Res Net-101 and batch size 256, we use 8 NVIDIA Tesla V100 GPUs for synchronous training, where each GPU uses a batch of 32 and no Sync BN is used.
Software Dependencies	No	The paper states 'we use the pre-trained Res Net-101_v2 model from the model zoo of MXNet Gluon CV 7' but does not provide specific version numbers for MXNet, Gluon CV, or any other software libraries used.
Experiment Setup	Yes	The hyperparameters to be tuned (and ranges) are: learning rate (0.1, 0.05, 0.01, 0.005, 0.001, 0.0001), momentum (0.9, 0.99, 0.95, 0.9, 0.8, 0.0) and weight decay (0.0, 0.0001, 0.0005, 0.001). We set the default hyperparameters to be batch size 2561, learning rate 0.01, momentum 0.9 and weight decay 0.0001. To avoid insufﬁcient training and observe the complete convergence behavior, we use 300 epochs for ﬁne-tuning and 600 epochs for scratch-training. The learning rate is decayed by a factor of 0.1 at epoch 150 and 250.