Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach
Authors: Yarin Bar, Shalev Shaer, Yaniv Romano
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our approach improves test-time accuracy under distribution shifts while maintaining accuracy and calibration in their absence, outperforming leading entropy minimization methods across various scenarios. |
| Researcher Affiliation | Academia | Yarin Bar1 Shalev Shaer2 Yaniv Romano1,2 1Department of Computer Science, Technion Israel Institute of Technology 2Department of Electrical and Computer Engineering, Technion Israel Institute of Technology {yarinbar,shalev.shaer}@campus.technion.ac.il yromano@technion.ac.il |
| Pseudocode | Yes | Algorithm 2 in the Appendix summarizes the entire adaptation process of POEM. (...) Algorithm 1 SF-OGD Step (...) Algorithm 2 Protected Online Entropy Matching (POEM) |
| Open Source Code | Yes | A software package that implements our methods is available at https://github.com/yarinbar/poem. |
| Open Datasets | Yes | Our experiments span Image Net, Image Net-C, CIFAR10-C, and CIFAR100-C datasets for evaluating the robustness to shifts induced by corruptions, and the Office Home dataset for domain adaptation. |
| Dataset Splits | Yes | We randomly sample 25% of the examples from Image Net validation set as an unlabelled holdout set. (...) Given the lack of a predefined data structure, we split the dataset into an 80% training set from the Real World samples, with the remainder serving as validation and holdout sets for our method and EATA. |
| Hardware Specification | Yes | All experiments are conducted on our local server, equipped with 16 NVIDIA A40 GPU 49GB GPUs, 192 Intel(R) Xeon(R) Gold 6336Y CPUs, and 1TB of RAM memory. Each experiment uses a single GPU and 8 CPUs. |
| Software Dependencies | No | The paper mentions using `timm` library, SAR and COTTA repositories, `torch-hub`, SGD, and Adam optimizers, but does not specify version numbers for these software components. |
| Experiment Setup | Yes | In all experiments conducted in this paper, we choose the following set of hyperparameters, defined in Algorithm 1: D = 1.8. γ = 1/8. (...) For POEM specifically, we implement an action delay of 100 examples throughout the experiments in this paper. (...) The learning rate (η in Algorithm 2) calculation follows these formulas: * Vi T: learning rate = 0.001 * 64 batch size * Res Net50: learning rate = 0.00025 * 64 batch size 2. (...) We use SGD optimizer with momentum of 0.9 for self-training. (...) We use λ = 0.40 log (1000). (...) A batch size of 1 is consistently used throughout all of the experiments. |