Test Time Adaptation via Conjugate Pseudo-labels

Authors: Sachin Goyal, Mingjie Sun, Aditi Raghunathan, J. Zico Kolter

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our approach consistently dominates other TTA alternatives over a wide range of domain adaptation benchmarks. Our approach is particularly of interest when applied to classifiers trained with novel loss functions, e.g., the recently-proposed Poly Loss [25] function, where it differs substantially from (and outperforms) an entropy-based loss. Further, we show that our conjugate based approach can also be interpreted as a kind of self-training using a very specific soft label, which we refer to as the conjugate pseudo-label. Overall, our method provides a broad framework for better understanding and improving test-time adaptation. Code is available at https://github.com/locuslab/ tta_conjugate.
Researcher Affiliation Collaboration Sachin Goyal 1 Mingjie Sun 1 Aditi Raghunathan1 Zico Kolter1,2 1Carnegie Mellon University, 2Bosch Center for AI {sachingo, mingjies, raditi, zkolter}@cs.cmu.edu
Pseudocode Yes The full procedure for test time adaptation via conjugate pseudo-labels is shown in Algorithm 1. (Algorithm 1 is presented on page 6).
Open Source Code Yes Code is available at https://github.com/locuslab/ tta_conjugate.
Open Datasets Yes We evaluate on the three common corruption benchmarks: adapting a classifier trained on CIFAR-10 to CIFAR-10-C, CIFAR-100 to CIFAR-100-C and Image Net to Image Net-C [15]. ... We also evaluate on three domain adaptation datasets: adapting a classifier trained on SVHN to MNIST, an Image Net classifier to Image Net-R [16] and adapting from synthetic to real data in VISDA-C [38].
Dataset Splits Yes We tune the learning rate (LR) and temperature (T) on the validation noises in the corruption benchmark by grid-search. LR is selected from {1e 1, 1e 2, . . . 1e 4} and T from {1, 2 . . . 5}. All the experiments have been performed on A6000 GPU s.
Hardware Specification Yes All the experiments have been performed on A6000 GPU s.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., libraries like PyTorch, TensorFlow, or specific Python versions).
Experiment Setup Yes We tune the learning rate (LR) and temperature (T) on the validation noises in the corruption benchmark by grid-search. LR is selected from {1e 1, 1e 2, . . . 1e 4} and T from {1, 2 . . . 5}. ... Following [50] and [40], we fine-tune by updating the learnable scale and shift parameters of the batch normalization layers across all adaptation losses. For each batch, batch normalization statistics is also updated, as suggested in [41].