Efficient ConvBN Blocks for Transfer Learning and Beyond
Authors: Kaichao You, Guo Qin, Anchang Bao, Meng Cao, Ping Huang, Jiulong Shan, Mingsheng Long
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments in object detection, classification, and adversarial example generation across 5 datasets and 12 model architectures, we demonstrate that the proposed Tune mode retains the performance while significantly reducing GPU memory footprint and training time, thereby contributing efficient Conv BN blocks for transfer learning and beyond. |
| Researcher Affiliation | Collaboration | Kaichao You , Guo Qin , Anchang Bao Meng Cao , Ping Huang , Jiulong Shan , Mingsheng Long B School of Software, BNRist, Tsinghua University, China Apple {ykc20,bac20,qing20}@mails.tsinghua.edu.cn {mengcao,Huang_ping,jlshan}@apple.com mingsheng@tsinghua.edu.cn |
| Pseudocode | Yes | Appendix C includes 'Listing 1: Computation details for consecutive Convolution and Batch Norm layers in different modes' which provides structured Python-like code snippets for Train, Eval, and Deploy modes. |
| Open Source Code | Yes | Our method has been integrated into both Py Torch (general machine learning framework) and MMCV/MMEngine (computer vision framework). Practitioners just need one line of code to enjoy our efficient Conv BN blocks thanks to Py Torch s builtin machine learning compilers. Our algorithm has been integrated into Py Torch core, MMCV, and MMEngine. |
| Open Datasets | Yes | Our algorithm has been tested against 5 datasets: CUB-200 (Wah et al., 2011), Standford Cars (Krause et al., 2013), Aircrafts (Maji et al., 2013), COCO (Lin et al., 2014), Image Net (Deng et al., 2009). |
| Dataset Splits | No | The paper discusses 'Eval mode' which 'can also be used to validate models during development' and provides training hyperparameters. However, it does not explicitly specify details for a validation dataset split, such as percentages, sample counts, or how validation was used for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | The total computation for results reported in this paper is about 3400 hours of V100 GPU (32GB) counted by our internal computing infrastructure. |
| Software Dependencies | Yes | Our method has been integrated into Py Torch core since version 2.2. For people using old versions of Py Torch (we require Py Torch larger than 1.8). |
| Experiment Setup | Yes | The below settings are taken from the default values in the TLlib library: Res Net50 is the backbone network and all parameters are optimized by Stochastic Gradient Descent with 0.9 momentum and 0.0005 weight decay. Each training process consisted of 20 epochs, with 500 iterations per epoch. We set the initial learning rates to 0.001 and 0.01 for the feature extractor and linear projection head respectively, and scheduled the learning rates of all layers to decay by 0.1 at epochs 8 and 12. The input images were all resized and cropped to 448 448, and the batch size was fixed at 48. |