DTL: Disentangled Transfer Learning for Visual Recognition
Authors: Minghao Fu, Ke Zhu, Jianxin Wu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted extensive experiments to validate the effectiveness of our method. The proposed method not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy, achieving new state-of-the-art on several standard benchmarks. |
| Researcher Affiliation | Academia | National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China fumh@lamda.nju.edu.cn, zhuk@lamda.nju.edu.cn, wujx2001@gmail.com |
| Pseudocode | No | The paper provides mathematical formulations and network diagrams but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'More details are available at https://www.lamda.nju.edu.cn/fumh/files/DTL/DTL_appendix.pdf' but does not explicitly state that source code for the methodology described is provided or link directly to a code repository. |
| Open Datasets | Yes | We conducted thorough experiments to evaluate the proposed method. First, we present results on the VTAB-1K (Zhai et al. 2019) benchmark... We further evaluate on five fine-grained few-shot learning benchmark: Aircraft (Maji et al. 2013), Pets (Parkhi et al. 2012), Food-101 (Bossard, Guillaumin, and Van Gool 2014), Cars (Krause et al. 2013) and Flowers102 (Nilsback and Zisserman 2008). |
| Dataset Splits | Yes | VTAB-1K was introduced by Zhai et al. (2019)... there are only 1,000 images in each dataset for training. ... we fine-tune the pre-trained model with training set containing {1, 2, 4, 8, 16}-shot per class and report the average accuracy on test set over 3 seeds. |
| Hardware Specification | Yes | Throughput (number of images processed per second with Vi T-B/16 as the backbone) measured on a single NVIDIA 3090 GPU with mixed precision inference. |
| Software Dependencies | No | The paper mentions 'Adam W' and 'cosine learning rate schedule' as optimizer details but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Following previous work (Lian et al. 2022; Jie and Deng 2023), we take Adam W (Loshchilov and Hutter 2019) with cosine learning rate schedule as the optimizer. β in Swish is fixed to 100. All pre-trained models are fine-tuned by 100 epochs with batch size 32. The rank d of low-rank linear mappings in CSN is 2 for Vi T and 4 for Swin-B. We set M (cf. Eq. 8-9) of DTL and DTL+ as 7 for the Vi T backbone |