AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

Authors: Shoufa Chen, Chongjian GE, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five image and video datasets show that Adapt Former largely improves Vi Ts in the target domains.
Researcher Affiliation Collaboration Shoufa Chen1 Chongjian Ge1 Zhan Tong2 Jiangliu Wang2 Yibing Song2 Jue Wang2 Ping Luo1 1The University of Hong Kong 2Tencent AI Lab
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Shoufa Chen/Adapt Former.
Open Datasets Yes Image domain : CIFAR-100 [54]... Street View House Numbers (SVHN) [37]... The Food-101 [9] dataset... Video domain : Something-Something V2 (SSv2) [39]... HMDB51 [55]... NUS-WIDE [24]
Dataset Splits Yes CIFAR-100 [54] contains 50,000 training images and 10,000 validation images... Something-Something V2 (SSv2) [39]... It consists of 168,913 training samples, 24,777 validation samples and 27,157 testing samples... HMDB51 [55] is composed of 6,849 videos with 51 categories, making a split of 3.5k/1.5k train/val videos.
Hardware Specification Yes In this work, we use Py Torch toolkit [68] to conduct all experiments on NVIDIA V100 GPUs.
Software Dependencies No The paper states 'we use Py Torch toolkit [68]', but it does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes Unless otherwise stated, we use 8 8 GPUs for video experiments and 1 8 GPUs for image experiments. ... For the newly added modules, the weights of down-projection layers are initialized with Kaiming Normal [44], while the biases of the additional networks and the weights of the up-projection layers are configured with zero initialization. ... We trained all models for 40 epochs using Adam optimize and 1-cycle learning rate policy [73]. The maximal learning rate is 0.001.