Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach

Authors: Yarin Bar, Shalev Shaer, Yaniv Romano

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our approach improves test-time accuracy under distribution shifts while maintaining accuracy and calibration in their absence, outperforming leading entropy minimization methods across various scenarios.
Researcher Affiliation Academia Yarin Bar1 Shalev Shaer2 Yaniv Romano1,2 1Department of Computer Science, Technion Israel Institute of Technology 2Department of Electrical and Computer Engineering, Technion Israel Institute of Technology {yarinbar,shalev.shaer}@campus.technion.ac.il yromano@technion.ac.il
Pseudocode Yes Algorithm 2 in the Appendix summarizes the entire adaptation process of POEM. (...) Algorithm 1 SF-OGD Step (...) Algorithm 2 Protected Online Entropy Matching (POEM)
Open Source Code Yes A software package that implements our methods is available at https://github.com/yarinbar/poem.
Open Datasets Yes Our experiments span Image Net, Image Net-C, CIFAR10-C, and CIFAR100-C datasets for evaluating the robustness to shifts induced by corruptions, and the Office Home dataset for domain adaptation.
Dataset Splits Yes We randomly sample 25% of the examples from Image Net validation set as an unlabelled holdout set. (...) Given the lack of a predefined data structure, we split the dataset into an 80% training set from the Real World samples, with the remainder serving as validation and holdout sets for our method and EATA.
Hardware Specification Yes All experiments are conducted on our local server, equipped with 16 NVIDIA A40 GPU 49GB GPUs, 192 Intel(R) Xeon(R) Gold 6336Y CPUs, and 1TB of RAM memory. Each experiment uses a single GPU and 8 CPUs.
Software Dependencies No The paper mentions using `timm` library, SAR and COTTA repositories, `torch-hub`, SGD, and Adam optimizers, but does not specify version numbers for these software components.
Experiment Setup Yes In all experiments conducted in this paper, we choose the following set of hyperparameters, defined in Algorithm 1: D = 1.8. γ = 1/8. (...) For POEM specifically, we implement an action delay of 100 examples throughout the experiments in this paper. (...) The learning rate (η in Algorithm 2) calculation follows these formulas: * Vi T: learning rate = 0.001 * 64 batch size * Res Net50: learning rate = 0.00025 * 64 batch size 2. (...) We use SGD optimizer with momentum of 0.9 for self-training. (...) We use λ = 0.40 log (1000). (...) A batch size of 1 is consistently used throughout all of the experiments.