Co-Tuning for Transfer Learning

Authors: Kaichao You, Zhi Kou, Mingsheng Long, Jianmin Wang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A simple instantiation of the framework shows strong empirical results in four visual classification tasks and one NLP classification task, bringing up to 20% relative improvement.
Researcher Affiliation Academia Kaichao You, Zhi Kou, Mingsheng Long (B), Jianmin Wang School of Software, BNRist, Research Center for Big Data, Tsinghua University, China {ykc20,kz19}@mails.tsinghua.edu.cn, {mingsheng,jimwang}@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Category relationship learning (the reverse approach), Algorithm 2 Neural network calibration
Open Source Code Yes Code is available at https://github.com/thuml/Co Tuning.
Open Datasets Yes In computer vision, we have models pre-trained on the Image Net (Deng et al., 2009) classification task... For medium-scale classification tasks, we use CUB-200-2011 (Welinder et al., 2010), Stanford Cars (Krause et al., 2013), and FGVC Aircraft (Maji et al., 2013) datasets. ...The large-scale dataset is constructed from COCO object detection task in 2017. ... We experiment with English named entity recognition (NER) task in Co NLL 2003 (Sang & De Meulder, 2003).
Dataset Splits Yes For datasets without validation splits, 20% training data are used for validation (split once and then the validation set is fixed) and the rest 80% training data are used for training. This way, each dataset has a train/val/test split.
Hardware Specification No The paper mentions '10K GPU hours' in the context of hyperparameter tuning for other methods, but does not specify the exact GPU models, CPU models, or other hardware specifications used for their own experiments.
Software Dependencies No The paper mentions using 'Py Torch' and 'scikit-learn' for implementation but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The learning rate for randomly initialized parameters is ten times of the learning rate for pre-trained parameters... Hyper-parameters of Co-Tuning and compared methods are selected by the performance on target validation data... all models are optimized by SGD with 0.9 momentum. Each experiment is repeated three times with different random seeds to collect mean and standard deviation of the performance.