DTL: Disentangled Transfer Learning for Visual Recognition

Authors: Minghao Fu, Ke Zhu, Jianxin Wu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments to validate the effectiveness of our method. The proposed method not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy, achieving new state-of-the-art on several standard benchmarks.
Researcher Affiliation Academia National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China fumh@lamda.nju.edu.cn, zhuk@lamda.nju.edu.cn, wujx2001@gmail.com
Pseudocode No The paper provides mathematical formulations and network diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'More details are available at https://www.lamda.nju.edu.cn/fumh/files/DTL/DTL_appendix.pdf' but does not explicitly state that source code for the methodology described is provided or link directly to a code repository.
Open Datasets Yes We conducted thorough experiments to evaluate the proposed method. First, we present results on the VTAB-1K (Zhai et al. 2019) benchmark... We further evaluate on five fine-grained few-shot learning benchmark: Aircraft (Maji et al. 2013), Pets (Parkhi et al. 2012), Food-101 (Bossard, Guillaumin, and Van Gool 2014), Cars (Krause et al. 2013) and Flowers102 (Nilsback and Zisserman 2008).
Dataset Splits Yes VTAB-1K was introduced by Zhai et al. (2019)... there are only 1,000 images in each dataset for training. ... we fine-tune the pre-trained model with training set containing {1, 2, 4, 8, 16}-shot per class and report the average accuracy on test set over 3 seeds.
Hardware Specification Yes Throughput (number of images processed per second with Vi T-B/16 as the backbone) measured on a single NVIDIA 3090 GPU with mixed precision inference.
Software Dependencies No The paper mentions 'Adam W' and 'cosine learning rate schedule' as optimizer details but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Following previous work (Lian et al. 2022; Jie and Deng 2023), we take Adam W (Loshchilov and Hutter 2019) with cosine learning rate schedule as the optimizer. β in Swish is fixed to 100. All pre-trained models are fine-tuned by 100 epochs with batch size 32. The rank d of low-rank linear mappings in CSN is 2 for Vi T and 4 for Swin-B. We set M (cf. Eq. 8-9) of DTL and DTL+ as 7 for the Vi T backbone