Efficient Low-rank Backpropagation for Vision Transformer Adaptation

Authors: Yuedong Yang, Hung-Yueh Chiang, Guihong Li, Diana Marculescu, Radu Marculescu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments with different models (Vi T, hybrid convolution-Vi T model) on multiple datasets to demonstrate the effectiveness of our method.
Researcher Affiliation Academia Yuedong Yang Hung-Yueh Chiang Guihong Li Diana Marculescu Radu Marculescu Chandra Family Department of Electrical and Computer Engineering The University of Texas at Austin {albertyoung,hungyueh.chiang,lgh,dianam,radum}@utexas.edu
Pseudocode Yes Algorithm 1 Backpropagation through a linear layer with LBP-WHT.
Open Source Code Yes Code: https://github.com/SLDGroup/LBP-WHT
Open Datasets Yes We use Image Net [21]-pretrained Vi Ts and finetune them on six different datasets, namely, CIFAR100 [22] (CF100), CIFAR10 [22] (CF10), Cars [23], Flowers [24], Food [25], and Pets [26]. We use the ADE20K [29]-pretrained Segformer-mit-b0 [30] model and finetune it on two datasets, Cityscapes [31] (City) and the enhanced Pascal-VOC 2012 [32] (VOC12A).
Dataset Splits No The paper mentions using well-known datasets like CIFAR100, CIFAR10, ImageNet, ADE20K, Cityscapes, and Pascal-VOC 2012, which typically have predefined splits. However, it does not explicitly state the training, validation, or test split percentages or sample counts used for these datasets within the paper's text.
Hardware Specification Yes Models are trained with an NVIDIA-A6000 GPU. To determine the computational requirements of different models and methods, we run model training on an Intel 11900K CPU and measure the exact FLOPs using the embedded performance tools perf in the Linux kernel v5.15.87. For preliminary deployment results, we test our method on the last two linear layers of Efficient Former-L1, using Open BLAS and Cu BLAS for CPU and GPU testing respectively on an NVIDIA Jetson Nano.
Software Dependencies Yes Environment: We setup our environment with Py Torch 1.13, MMClassification v0.25 and MMSegmentation v0.30.
Experiment Setup Yes Each model is finetuned for 50 epochs using the Adam W [27] optimizer and a batch size of 64. The learning rate is adjusted for each dataset based on the performance of Efficient Former-L1 [28] with vanilla BP. The images are downscaled and cropped to a size of 512 512 pixels for training. Models are finetuned for 20,000 steps using the Adam W optimizer and a batch size of 8.