Co-Tuning for Transfer Learning
Authors: Kaichao You, Zhi Kou, Mingsheng Long, Jianmin Wang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A simple instantiation of the framework shows strong empirical results in four visual classification tasks and one NLP classification task, bringing up to 20% relative improvement. |
| Researcher Affiliation | Academia | Kaichao You, Zhi Kou, Mingsheng Long (B), Jianmin Wang School of Software, BNRist, Research Center for Big Data, Tsinghua University, China {ykc20,kz19}@mails.tsinghua.edu.cn, {mingsheng,jimwang}@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Category relationship learning (the reverse approach), Algorithm 2 Neural network calibration |
| Open Source Code | Yes | Code is available at https://github.com/thuml/Co Tuning. |
| Open Datasets | Yes | In computer vision, we have models pre-trained on the Image Net (Deng et al., 2009) classification task... For medium-scale classification tasks, we use CUB-200-2011 (Welinder et al., 2010), Stanford Cars (Krause et al., 2013), and FGVC Aircraft (Maji et al., 2013) datasets. ...The large-scale dataset is constructed from COCO object detection task in 2017. ... We experiment with English named entity recognition (NER) task in Co NLL 2003 (Sang & De Meulder, 2003). |
| Dataset Splits | Yes | For datasets without validation splits, 20% training data are used for validation (split once and then the validation set is fixed) and the rest 80% training data are used for training. This way, each dataset has a train/val/test split. |
| Hardware Specification | No | The paper mentions '10K GPU hours' in the context of hyperparameter tuning for other methods, but does not specify the exact GPU models, CPU models, or other hardware specifications used for their own experiments. |
| Software Dependencies | No | The paper mentions using 'Py Torch' and 'scikit-learn' for implementation but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The learning rate for randomly initialized parameters is ten times of the learning rate for pre-trained parameters... Hyper-parameters of Co-Tuning and compared methods are selected by the performance on target validation data... all models are optimized by SGD with 0.9 momentum. Each experiment is repeated three times with different random seeds to collect mean and standard deviation of the performance. |