Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
Authors: Viraj Prabhu, Sriram Yenamandra, Aaditya Singh, Judy Hoffman
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our simple approach leads to consistent performance gains over competing methods that use Vi Ts and self-supervised initializations on standard object recognition benchmarks. Our code is available at https://github.com/virajprabhu/PACMAC. ... We evaluate PACMAC on three classification benchmarks for domain adaptation... Tables1, 2, and 3, present results. |
| Researcher Affiliation | Academia | Viraj Prabhu Sriram Yenamandra Aaditya Singh Judy Hoffman {virajp,sriramy,asingh,judy}@gatech.edu Georgia Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Attention-conditioned Masking ... Algorithm 2 PACMAC Optimization |
| Open Source Code | Yes | Our code is available at https://github.com/virajprabhu/PACMAC. |
| Open Datasets | Yes | We evaluate PACMAC on three classification benchmarks for domain adaptation: i) Office Home [30]... ii) Domain Net [31]... iii) Vis DA2017 [32]... |
| Dataset Splits | Yes | In unsupervised domain adaptation (UDA) we are given access to labeled source instances (x S, y S) PS(X, Y), and unlabeled target instances x T PT (X)... For a target instance x T PT , we generate a committee of k masked versions. ... Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Sec. 4.2 and supplementary. |
| Hardware Specification | No | The paper mentions "All experiments use Py Torch [48]" but does not specify any hardware like GPUs or CPUs. The checklist indicates that the information is in supplementary material, but not in the main paper: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See supplementary." |
| Software Dependencies | No | All experiments use Py Torch [48]. ... We use the Adam W [46] optimizer. ... We use Rand Augment [47]. No specific version numbers are provided for these software libraries, which is required for a reproducible description. |
| Experiment Setup | Yes | We pretrain on the combined source and target domain for 800 epochs (MAE) and 200 epochs (DINO). For pretraining, we linearly scale the learning rate to 4 10 4 (MAE) and 5 10 5 (DINO) during a 40 epoch warmup phase followed by a cosine decay. We use the Adam W [46] optimizer. For PACMAC, we use k = 2, mr = 0.75, T = 50%, and α = 0.1. We use Rand Augment [47] with N = 3 and M = 4.0 during pretraining and N = 1 and M = 2.0 during DA. On Office Home and Domain Net, we finetune on the source and adapt for 100 epochs each, and perform 10 epochs of each phase on Vis DA. We use a learning rate of 2 10 4 and weight decay of 0.05. |