AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Authors: Shoufa Chen, Chongjian GE, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on five image and video datasets show that Adapt Former largely improves Vi Ts in the target domains. |
| Researcher Affiliation | Collaboration | Shoufa Chen1 Chongjian Ge1 Zhan Tong2 Jiangliu Wang2 Yibing Song2 Jue Wang2 Ping Luo1 1The University of Hong Kong 2Tencent AI Lab |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Shoufa Chen/Adapt Former. |
| Open Datasets | Yes | Image domain : CIFAR-100 [54]... Street View House Numbers (SVHN) [37]... The Food-101 [9] dataset... Video domain : Something-Something V2 (SSv2) [39]... HMDB51 [55]... NUS-WIDE [24] |
| Dataset Splits | Yes | CIFAR-100 [54] contains 50,000 training images and 10,000 validation images... Something-Something V2 (SSv2) [39]... It consists of 168,913 training samples, 24,777 validation samples and 27,157 testing samples... HMDB51 [55] is composed of 6,849 videos with 51 categories, making a split of 3.5k/1.5k train/val videos. |
| Hardware Specification | Yes | In this work, we use Py Torch toolkit [68] to conduct all experiments on NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper states 'we use Py Torch toolkit [68]', but it does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Unless otherwise stated, we use 8 8 GPUs for video experiments and 1 8 GPUs for image experiments. ... For the newly added modules, the weights of down-projection layers are initialized with Kaiming Normal [44], while the biases of the additional networks and the weights of the up-projection layers are configured with zero initialization. ... We trained all models for 40 epochs using Adam optimize and 1-cycle learning rate policy [73]. The maximal learning rate is 0.001. |