reproducibilityindex.ai

Rethinking Supervised Pre-Training for Better Downstream Transferring

Authors: Yutong Feng, Jianwen Jiang, Mingqian Tang, Rong Jin, Yue Gao

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical studies on multiple downstream tasks show that LOOK outperforms other state-of-the-art methods for supervised and self-supervised pre-training.
Researcher Affiliation	Collaboration	Yutong Feng BNRist, THUIBCS, KLISS, BLBCI School of Software, Tsinghua University fyt19@mails.tsinghua.edu.cn Jianwen Jiang Alibaba Group jianwen.jjw@alibaba-inc.com Mingqian Tang Alibaba Group mingqian.tmq@alibaba-inc.com Rong Jin Alibaba Group jinrong.jr@alibaba-inc.com Yue Gao BNRist, THUIBCS, KLISS, BLBCI School of Software, Tsinghua University gaoyue@tsinghua.edu.cn
Pseudocode	No	The paper describes the mathematical formulation and details of the LOOK algorithm, but it does not present it as a formal pseudocode or algorithm block.
Open Source Code	No	The paper mentions utilizing 'official open-source codebase' or 'official provided pre-trained model' for other methods (Examplar-v2, Mo Co-v2, Sim Siam, Sim CLR, BYOL, Transfer-Learning-Library) but does not provide a link or statement about open-sourcing the code for the LOOK method developed in this paper.
Open Datasets	Yes	For the upstream dataset, we use the Image Net ILSVRC (Deng et al., 2009) with 1.28M images of 1K categories since most pre-training methods for comparison are trained on Image Net. For the downstream datasets, we select 9 ﬁne-grained datasets from varying domains to evaluate model s transferability inspired by Islam et al. (2021), including the Aircraft (Maji et al., 2013), Cars (Krause et al., 2013), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), Flowers (Nilsback & Zisserman, 2008), ISIC (Codella et al., 2019), Kaokore (Tian et al., 2020), Omniglot (Lake et al., 2015) and Pets (Patino et al., 2016).
Dataset Splits	Yes	For the train/validation/test split of each dataset, we follow the original split for those with ofﬁcial split ﬁle, i.e. Aircraft, DTD (the ﬁrst ofﬁcial split), Flowers and Kaokore. For the remaining datasets with only train/test split, we preserve the test set, and randomly split the training set into training and validation sets with the proportion of 7 : 3 inside each category.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software like TensorFlow and PyTorch but does not specify their version numbers or the versions of any other key libraries or dependencies.
Experiment Setup	Yes	For the implementation of our proposed LOOK, we use queue size q = 65536, momentum m = 0.99, temperature τ = 1.0 and decaying k linearly from 400 to 40. All the implemented methods are trained by 90 epochs with an initial learning rate of 0.1, multiplied by 0.1 for every 30 epochs. We use Res Net-50 (He et al., 2016) as the backbone encoder and train using SGD optimizer with momentum 0.9 and weight decay 0.0001. During the ﬁne-tuning stage, we train on the downstream datasets 50 epochs and decay the learning rate at the 25 and 37 epochs by 0.1. For the remaining hyper-parameters of training, we conduct grid search for the initial learning rate of 0.001, 0.01 and 0.1, weight decay of 0, 1e 4 and 1e 5, batch size of 32 and 128.