Auto-Prox: Training-Free Vision Transformer Architecture Search via Automatic Proxy Discovery

Authors: Zimian Wei, Peijie Dong, Zheng Hui, Anggeng Li, Lujun Li, Menglong Lu, Hengyue Pan, Dongsheng Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method generalizes well to different datasets and achieves state-of-the-art results both in ranking correlation and final accuracy. We conduct extensive experiments on CIFAR-100, Flowers, Chaoyang (Zhu et al. 2021), and Image Net-1K to validate the superiority of our proposed method.
Researcher Affiliation Collaboration 1 National University of Defense Technology 2The Hong Kong University of Science and Technology (Guangzhou) 3Columbia University 4Huawei 5The Hong Kong University of Science and Technology
Pseudocode Yes Algorithm 1: Evolutionary Search for Auto-Prox Input: Search space S, population P, max iteration T , sample ratio r, sampled pool R, topk k, margin m. Output: Auto-prox with best JCM. 1: P0 := Initialize population(Pi); 2: Sample pool R := ; 3: for i = 1, 2, . . . , T do 4: Clear sample pool R := ; 5: Randomly select R P; 6: Candidates Gik := Get Topk(R, k); 7: Parent Gp i := Random Select(Gik); 8: Mutate Gm i := MUTATE(Gp i ); 9: // Elitism-Preserve Strategy. 10: if JCM(Gm i ) JCM(Gp i ) m then 11: Append Gm i to P; 12: else 13: Go to line 8; 14: end if 15: Remove the zero-cost proxy with the lowest JCM. 16: end for
Open Source Code Yes Codes can be found at https://github.com/lilujunai/Auto-Prox-AAAI24.
Open Datasets Yes First, we build the Vi T-Bench-101, which involves different Vi T candidates and their actual performance on multiple datasets. For the tiny datasets, we employ CIFAR-100 (Krizhevsky 2009), Flowers (Nilsback and Zisserman 2008), and Chaoyang (Zhu et al. 2021), while for the large-scale datasets, we focus on Image Net-1K.
Dataset Splits Yes We partition the whole Vi T-Bench-101 dataset into a validation set (60%) for proxy searching and a test set (40%) for proxy evaluation. There is no overlap between these two sets.
Hardware Specification Yes The zero-cost proxy search process is conducted on a single NVIDIA A40 GPU and occupies the memory of only one Vi T.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) are mentioned for the experimental setup.
Experiment Setup Yes In the evolutionary search process, we employ a population size of P = 20, and the total number of iterations T is set to 200. When conducting mutation, the probability of mutation for a single node in a zero-cost proxy representation is set to 0.5. The margin m in the Elitism-Preserve Strategy is 0.1.