Improved Fine-Tuning by Better Leveraging Pre-Training Data
Authors: Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Xiangyang Ji, Antoni Chan, Rong Jin
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results for image classification tasks on 8 benchmark data sets verify the effectiveness of the proposed data selection based fine-tuning pipeline. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, City University of Hong Kong 2School of Artificial Intelligence, Dalian University of Technology 3DAMO Academy, Alibaba Group 4Department of Automation, Tsinghua University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks, only descriptions of the proposed methods in text. |
| Open Source Code | Yes | Our code is available at https://github.com/ziquanliu/NeurIPS2022_UOT_fine_tuning. |
| Open Datasets | Yes | The pre-trained model is tested on 8 target image classification data sets, i.e. Stanford dogs (Dogs) [34], Stanford cars (Cars) [35], Caltech-UCSD birds (CUB) [7], Oxford-IIIT Pet (Pets) [36], SUN [37], FGVC-Aircraft (Aircraft) [38], Describable Textures data set (DTD) [39] and Caltech101 (Caltech) [40]. |
| Dataset Splits | Yes | we search the initial learning rate from {1e-4, 3e-4, 1e-3, 3e-3, 1e-2} on a validation set and report the test accuracy trained on the original training or train+val set. |
| Hardware Specification | No | The paper states 'See the supplemental' for the total amount of compute and type of resources used, but these details are not provided within the main body of the paper. |
| Software Dependencies | No | The paper mentions software components like ResNet18 and MoCo-v2, and K-means, but does not provide specific version numbers for these or other core software dependencies used in the experiments. |
| Experiment Setup | Yes | The training epochs are fixed to be 100 in our experiment for sufficient training and the learning rate is divided by 10 at 60 and 80 epoch. Other hyperparameters like initial learning rate, weight decay and λ are determined by grid search... The batch size for fine-tuning data is 256... In UOT, we set ϵ = 1.0, τ1 = 1.0 and τ2 = 100.0. The distance cost is based on the cosine similarity Cij = cos(ai,bj)+1 ϵc with ϵc = 0.01. |