Rethinking the Hyperparameters for Fine-tuning
Authors: Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our findings are based on extensive empirical evaluation for fine-tuning on various transfer learning benchmarks. |
| Researcher Affiliation | Collaboration | Hao Li1, Pratik Chaudhari2 , Hao Yang1, Michael Lam1, Avinash Ravichandran1, Rahul Bhotika1, Stefano Soatto1,3 1Amazon Web Services, 2University of Pennsylvania, 3University of California, Los Angeles |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present any structured code blocks within the text. |
| Open Source Code | No | The paper refers to 'the code5 provided by (Li et al., 2018)' (footnote 5 links to 'https://github.com/holyseven/Transfer Learning Classification') for comparison experiments in Section 3.4, but does not state that the code for its own methodology or experimental setup is being released or made available. |
| Open Datasets | Yes | We evaluate fine-tuning on seven widely used image classification datasets, which covers tasks for fine-grained object recognition, scene recognition and general object recognition. Detailed statistics of each dataset can be seen in Table 1. We use Image Net (Russakovsky et al., 2015), Places-365 (Zhou et al., 2018) and i Naturalist (Van Horn et al., 2018) as source domains for pre-trained models. |
| Dataset Splits | Yes | We report the Top-1 validation (test) error at the end of training. |
| Hardware Specification | Yes | For each training job with Res Net-101 and batch size 256, we use 8 NVIDIA Tesla V100 GPUs for synchronous training, where each GPU uses a batch of 32 and no Sync BN is used. |
| Software Dependencies | No | The paper states 'we use the pre-trained Res Net-101_v2 model from the model zoo of MXNet Gluon CV 7' but does not provide specific version numbers for MXNet, Gluon CV, or any other software libraries used. |
| Experiment Setup | Yes | The hyperparameters to be tuned (and ranges) are: learning rate (0.1, 0.05, 0.01, 0.005, 0.001, 0.0001), momentum (0.9, 0.99, 0.95, 0.9, 0.8, 0.0) and weight decay (0.0, 0.0001, 0.0005, 0.001). We set the default hyperparameters to be batch size 2561, learning rate 0.01, momentum 0.9 and weight decay 0.0001. To avoid insufficient training and observe the complete convergence behavior, we use 300 epochs for fine-tuning and 600 epochs for scratch-training. The learning rate is decayed by a factor of 0.1 at epoch 150 and 250. |