Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency

Authors: Viraj Prabhu, Sriram Yenamandra, Aaditya Singh, Judy Hoffman

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our simple approach leads to consistent performance gains over competing methods that use Vi Ts and self-supervised initializations on standard object recognition benchmarks. Our code is available at https://github.com/virajprabhu/PACMAC. ... We evaluate PACMAC on three classification benchmarks for domain adaptation... Tables1, 2, and 3, present results.
Researcher Affiliation Academia Viraj Prabhu Sriram Yenamandra Aaditya Singh Judy Hoffman {virajp,sriramy,asingh,judy}@gatech.edu Georgia Institute of Technology
Pseudocode Yes Algorithm 1 Attention-conditioned Masking ... Algorithm 2 PACMAC Optimization
Open Source Code Yes Our code is available at https://github.com/virajprabhu/PACMAC.
Open Datasets Yes We evaluate PACMAC on three classification benchmarks for domain adaptation: i) Office Home [30]... ii) Domain Net [31]... iii) Vis DA2017 [32]...
Dataset Splits Yes In unsupervised domain adaptation (UDA) we are given access to labeled source instances (x S, y S) PS(X, Y), and unlabeled target instances x T PT (X)... For a target instance x T PT , we generate a committee of k masked versions. ... Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Sec. 4.2 and supplementary.
Hardware Specification No The paper mentions "All experiments use Py Torch [48]" but does not specify any hardware like GPUs or CPUs. The checklist indicates that the information is in supplementary material, but not in the main paper: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See supplementary."
Software Dependencies No All experiments use Py Torch [48]. ... We use the Adam W [46] optimizer. ... We use Rand Augment [47]. No specific version numbers are provided for these software libraries, which is required for a reproducible description.
Experiment Setup Yes We pretrain on the combined source and target domain for 800 epochs (MAE) and 200 epochs (DINO). For pretraining, we linearly scale the learning rate to 4 10 4 (MAE) and 5 10 5 (DINO) during a 40 epoch warmup phase followed by a cosine decay. We use the Adam W [46] optimizer. For PACMAC, we use k = 2, mr = 0.75, T = 50%, and α = 0.1. We use Rand Augment [47] with N = 3 and M = 4.0 during pretraining and N = 1 and M = 2.0 during DA. On Office Home and Domain Net, we finetune on the source and adapt for 100 epochs each, and perform 10 epochs of each phase on Vis DA. We use a learning rate of 2 10 4 and weight decay of 0.05.