Rethinking Supervised Pre-Training for Better Downstream Transferring
Authors: Yutong Feng, Jianwen Jiang, Mingqian Tang, Rong Jin, Yue Gao
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical studies on multiple downstream tasks show that LOOK outperforms other state-of-the-art methods for supervised and self-supervised pre-training. |
| Researcher Affiliation | Collaboration | Yutong Feng BNRist, THUIBCS, KLISS, BLBCI School of Software, Tsinghua University fyt19@mails.tsinghua.edu.cn Jianwen Jiang Alibaba Group jianwen.jjw@alibaba-inc.com Mingqian Tang Alibaba Group mingqian.tmq@alibaba-inc.com Rong Jin Alibaba Group jinrong.jr@alibaba-inc.com Yue Gao BNRist, THUIBCS, KLISS, BLBCI School of Software, Tsinghua University gaoyue@tsinghua.edu.cn |
| Pseudocode | No | The paper describes the mathematical formulation and details of the LOOK algorithm, but it does not present it as a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper mentions utilizing 'official open-source codebase' or 'official provided pre-trained model' for *other* methods (Examplar-v2, Mo Co-v2, Sim Siam, Sim CLR, BYOL, Transfer-Learning-Library) but does not provide a link or statement about open-sourcing the code for the LOOK method developed in this paper. |
| Open Datasets | Yes | For the upstream dataset, we use the Image Net ILSVRC (Deng et al., 2009) with 1.28M images of 1K categories since most pre-training methods for comparison are trained on Image Net. For the downstream datasets, we select 9 fine-grained datasets from varying domains to evaluate model s transferability inspired by Islam et al. (2021), including the Aircraft (Maji et al., 2013), Cars (Krause et al., 2013), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), Flowers (Nilsback & Zisserman, 2008), ISIC (Codella et al., 2019), Kaokore (Tian et al., 2020), Omniglot (Lake et al., 2015) and Pets (Patino et al., 2016). |
| Dataset Splits | Yes | For the train/validation/test split of each dataset, we follow the original split for those with official split file, i.e. Aircraft, DTD (the first official split), Flowers and Kaokore. For the remaining datasets with only train/test split, we preserve the test set, and randomly split the training set into training and validation sets with the proportion of 7 : 3 inside each category. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like TensorFlow and PyTorch but does not specify their version numbers or the versions of any other key libraries or dependencies. |
| Experiment Setup | Yes | For the implementation of our proposed LOOK, we use queue size q = 65536, momentum m = 0.99, temperature τ = 1.0 and decaying k linearly from 400 to 40. All the implemented methods are trained by 90 epochs with an initial learning rate of 0.1, multiplied by 0.1 for every 30 epochs. We use Res Net-50 (He et al., 2016) as the backbone encoder and train using SGD optimizer with momentum 0.9 and weight decay 0.0001. During the fine-tuning stage, we train on the downstream datasets 50 epochs and decay the learning rate at the 25 and 37 epochs by 0.1. For the remaining hyper-parameters of training, we conduct grid search for the initial learning rate of 0.001, 0.01 and 0.1, weight decay of 0, 1e 4 and 1e 5, batch size of 32 and 128. |