Masked Distillation Advances Self-Supervised Transformer Architecture Search

Authors: Caixia Yan, Xiaojun Chang, Zhihui Li, Lina Yao, Minnan Luo, Qinghua Zheng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the searched architectures can achieve state-of-the-art accuracy on CIFAR-10, CIFAR-100, and Image Net datasets even without using manual labels.
Researcher Affiliation Academia Caixia Yan School of Computer Science and Technology MOEKLINNS Laboratory, Xi an Jiaotong University yancaixia@xjtu.edu.cn Xiaojun Chang University of Science and Technology of China Mohamed bin Zayed University of Artificial Intelligence cxj273@gmail.com Zhihui Li School of Information Science and Technology University of Science and Technology of China zhihuilics@gmail.com Lina Yao CSIRO s Data61 University of New South Wales lina.yao@data61.csiro.au Minnan Luo School of Computer Science and Technology MOEKLINNS Laboratory, Xi an Jiaotong University minnluo@xjtu.edu.cn Qinghua Zheng School of Computer Science and Technology MOEKLINNS Laboratory, Xi an Jiaotong University qhzheng@mail.xjtu.edu.cn
Pseudocode Yes Algorithm 1 Self-supervised Supernet Training in Mask TAS.
Open Source Code No The paper does not include an explicit statement about the release of its source code or a direct link to a code repository.
Open Datasets Yes We evaluate the effectiveness of the proposed method over the large-scale Image Net Russakovsky et al. (2015) dataset, which contains 1.28 million images in 1000 categories collected for the image classification task. We also experiment on other classification tasks, including CIFAR-10 Krizhevsky et al. (2009), CIFAR-100 Krizhevsky et al. (2009), PETS Parkhi et al. (2012) and Flowers Nilsback & Zisserman (2008).
Dataset Splits Yes During supernet training, each supernet is pre-trained under a 100-epoch schedule on Image Net-1K training set. ... During architecture search, we employ the Image Net validation set for model testing.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for ancillary software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes During supernet training, each supernet is pre-trained under a 100-epoch schedule on Image Net-1K training set. ... We adopt a cosine decay schedule with a warm-up for 20 epochs. We adopt Adam optimizer with a weight decay of 0.05. The size of input image is set to 224 224 and the masking ratio is set to 90% by default. ... We perform evolutionary search for 20 epochs to get the optimal architecture, where the population size Np is set to 50. ... We fine-tune the searched architecture for 100 epochs with a batch size of 2048, a learning rate of 5e-3, and a drop path rate of 0.1.