Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Authors: Johannes Lehner, Benedikt Alkin, Andreas Fürst, Elisabeth Rumetshofer, Lukas Miklautz, Sepp Hochreiter
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments and Analysis |
| Researcher Affiliation | Academia | 1ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning Johannes Kepler University, Linz, Austria 2Faculty of Computer Science, University of Vienna, Vienna, Austria 3Uni Vie Doctoral School Computer Science, University of Vienna 4Institute of Advanced Research in Artificial Intelligence (IARAI) lehner@ml.jku.at, alkin@ml.jku.at, lukas.miklautz@univie.ac.at |
| Pseudocode | No | The paper describes methods in prose and figures (e.g., Figure 3), but does not contain a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Project page: github.com/ml-jku/MAE-CT. We provide access to our code, model checkpoints and supplement on our project page: github.com/ml-jku/MAE-CT. |
| Open Datasets | Yes | Evaluation We evaluate our approach via image classification on Image Net (Deng et al. 2009) |
| Dataset Splits | Yes | Evaluation We evaluate our approach via image classification on Image Net (Deng et al. 2009), where we vary the number of used labels from 100% down to a single label per class. ... For evaluating the representation using 100% of the labels, we train a linear probe and a k-NN classifier. With 10% and 1% of the labels, we fine-tune the encoder and in the extreme low-shot settings (<1% labels), we report the accuracy of a logistic regression classifier averaged over three splits. The detailed protocols can be found in Supplement B. |
| Hardware Specification | Yes | We acknowledge Euro HPC Joint Undertaking for awarding us access to Karolina at IT4Innovations, Czech Republic and to Melu Xina at Lux Provide, Luxembourg. |
| Software Dependencies | No | The paper describes various model parameters, optimizers, and training schedules (e.g., 'MAE pre-training. We train for 1600 epochs with a learning rate of 1.5e 4 and use the normalize pixels variant of the MAE loss'), but does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | Implementation Details We outline the most important implementation details and provide all further information in Supplement A. MAE pre-training. We train for 1600 epochs with a learning rate of 1.5e 4 and use the normalize pixels variant of the MAE loss, which applies a patch-wise normalization to the target pixels before the mean-squared-error loss. NNCLR initialization. Following (Dwibedi et al. 2021), we use a 3-layer MLP as projector, a 2-layer MLP as predictor and a queue Q of length 65536. To initialize the NNCLRhead, we train for 20 epochs on the output of the fully frozen pre-trained MAE encoder with a learning rate of 1e 4, a temperature τ of 0.15 and the default top1-NN lookup. Contrastive tuning. We use a learning rate of 1e 4 and apply layer-wise learning rate decay (Clark et al. 2020) with decay factor 0.65 to the upper half of the Vi T blocks while freezing the lower half. For MAE-CTmin, we train Vi T-B/L for 20 epochs and Vi T-H for 30 epochs. For MAE-CTaug we train Vi T-B for 80 epochs and Vi T-L/H for 40 epochs. |