Denoising Masked Autoencoders Help Robust Classification

Authors: QuanLin Wu, Hang Ye, Yuntian Gu, Huishuai Zhang, Liwei Wang, Di He

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically evaluate our proposed DMAE on Image Net and CIFAR-10 datasets. We also study the influence of different hyperparameters and training strategies on the final model performance. All experiments are repeated ten times with different seeds. Average performance is reported, and details can be found in the appendix.Results. We list the detailed results of our model and representative baseline methods in Table 2.
Researcher Affiliation Collaboration Quanlin Wu1, Hang Ye1, Yuntian Gu1, Huishuai Zhang2, Liwei Wang3 , Di He3 1Peking University 2Microsoft Research Asia 3National Key Lab of General AI, School of Artificial Intelligence, Peking University {quanlin, yehang, dihe}@pku.edu.cn, guyuntian@stu.pku.edu.cn, huzhang@microsoft.com, wanglw@cis.pku.edu.cn
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Models and code are available at https://github.com/quanlin-wu/dmae.
Open Datasets Yes Following He et al. (2022); Xie et al. (2022), we use Image Net-1k as the pre-training corpus which contains 1.28 million images. We also demonstrate that the pre-trained model has good transferability to the CIFAR-10 dataset.
Dataset Splits Yes The result is averaged over 1,000 images uniformly selected from Image Net validation set, following Carlini et al. (2022). We draw n = 100, 000 noise samples and report results averaged over the entire CIFAR-10 test set.
Hardware Specification No The paper does not explicitly describe any specific hardware components (e.g., GPU models, CPU models, or cloud computing instance types) used for the experiments.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'Transformer layers' but does not specify version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup Yes For the pre-training of the two DMAE models, we set the masking ratio to 0.75 following He et al. (2022). The noise level σ is set to 0.25. Random resizing and cropping are used as data augmentation to avoid overfitting. The Vi T-B and Vi T-L models are pre-trained for 1100 and 1600 epochs, where the batch size is set to 4096. We use the Adam W optimizer with β1, β2 = 0.9, 0.95, and adjust the learning rate to 1.5e-4. The weight decay factor is set to 0.05. In the fine-tuning stage, we add a linear prediction head on top of the encoder for classification. The Vi T-B model is fine-tuned for 100 epochs, while the Vi T-L is fine-tuned for 50 epochs. Both settings use Adam W with β1, β2 = 0.9, 0.999. The weight decay factor is set to 0.05. We set the base learning rate to 5e-4 for Vi T-B and 1e-3 for Vi T-L. For the consistency regularization loss terms, we set the hyperparameters λ = 2.0 and µ = 0.5 for σ {0.25, 0.5}, and set λ = 2.0 and µ = 0.1 for σ = 1.0.