Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Test Time Adaptation via Conjugate Pseudo-labels
Authors: Sachin Goyal, Mingjie Sun, Aditi Raghunathan, J. Zico Kolter
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our approach consistently dominates other TTA alternatives over a wide range of domain adaptation benchmarks. Our approach is particularly of interest when applied to classifiers trained with novel loss functions, e.g., the recently-proposed Poly Loss [25] function, where it differs substantially from (and outperforms) an entropy-based loss. Further, we show that our conjugate based approach can also be interpreted as a kind of self-training using a very specific soft label, which we refer to as the conjugate pseudo-label. Overall, our method provides a broad framework for better understanding and improving test-time adaptation. Code is available at https://github.com/locuslab/ tta_conjugate. |
| Researcher Affiliation | Collaboration | Sachin Goyal 1 Mingjie Sun 1 Aditi Raghunathan1 Zico Kolter1,2 1Carnegie Mellon University, 2Bosch Center for AI EMAIL |
| Pseudocode | Yes | The full procedure for test time adaptation via conjugate pseudo-labels is shown in Algorithm 1. (Algorithm 1 is presented on page 6). |
| Open Source Code | Yes | Code is available at https://github.com/locuslab/ tta_conjugate. |
| Open Datasets | Yes | We evaluate on the three common corruption benchmarks: adapting a classifier trained on CIFAR-10 to CIFAR-10-C, CIFAR-100 to CIFAR-100-C and Image Net to Image Net-C [15]. ... We also evaluate on three domain adaptation datasets: adapting a classifier trained on SVHN to MNIST, an Image Net classifier to Image Net-R [16] and adapting from synthetic to real data in VISDA-C [38]. |
| Dataset Splits | Yes | We tune the learning rate (LR) and temperature (T) on the validation noises in the corruption benchmark by grid-search. LR is selected from {1e 1, 1e 2, . . . 1e 4} and T from {1, 2 . . . 5}. All the experiments have been performed on A6000 GPU s. |
| Hardware Specification | Yes | All the experiments have been performed on A6000 GPU s. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., libraries like PyTorch, TensorFlow, or specific Python versions). |
| Experiment Setup | Yes | We tune the learning rate (LR) and temperature (T) on the validation noises in the corruption benchmark by grid-search. LR is selected from {1e 1, 1e 2, . . . 1e 4} and T from {1, 2 . . . 5}. ... Following [50] and [40], we fine-tune by updating the learnable scale and shift parameters of the batch normalization layers across all adaptation losses. For each batch, batch normalization statistics is also updated, as suggested in [41]. |