TransTailor: Pruning the Pre-trained Model for Improved Transfer Learning
Authors: Bingyan Liu, Yifeng Cai, Yao Guo, Xiangqun Chen8627-8634
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple pre-trained models and datasets demonstrate that Trans Tailor outperforms the traditional pruning methods and achieves competitive or even better performance than other state-of-the-art transfer learning methods while using a smaller model. |
| Researcher Affiliation | Academia | Bingyan Liu1, Yifeng Cai2, Yao Guo1 , Xiangqun Chen1 1MOE Key Lab of HCST, Dept of Computer Science, School of EECS, Peking University 2School of Software and Microelectronics, Peking University {lby cs, caiyifeng, yaoguo, cherry}@pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 The Pipeline of Trans Tailor |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for Trans Tailor. |
| Open Datasets | Yes | We evaluate Trans Tailor on the following five datasets that are widely used in transfer learning: Caltech 256-30 & Caltech 256-60 (Griffin, Holub, and Perona 2007). CUB-200 (Wah et al. 2011) and Stanford Dogs (Khosla et al. 2011). MIT Indoor-67 (Quattoni and Torralba 2009). All of them are pre-trained with the Image Net2012 (Deng et al. 2009) dataset. |
| Dataset Splits | No | The paper mentions '80 training images and 20 test images' for MIT Indoor-67 and '30 and 60 randomly sampled training examples' for Caltech256, and the overall iterative process uses accuracy on Dt (target data) to select the optimal sub-model, but it does not specify a general, explicit train/validation/test split for all datasets in a reproducible manner. For example, it doesn't clearly state the size or percentage of a dedicated validation set. |
| Hardware Specification | No | The paper does not specify the hardware used for conducting the experiments (e.g., GPU models, CPU types). |
| Software Dependencies | No | All experiments are conducted with Py Torch framework. Pre-trained models are provided by Torchvision. However, no specific version numbers for PyTorch or Torchvision are provided. |
| Experiment Setup | Yes | The learning rate is set to 0.005 for the FC layer and 0.0005 for Conv layers. τ is set to 0.3. After 10% FLOPs of the pre-trained model is pruned with our target-aware pruning, we conduct the importance-aware fine-tuning with 40 epochs for Res Net101 and 60 epochs for VGG-16. |