Evaluation of Test-Time Adaptation Under Computational Time Constraints

Authors: Motasem Alfarra, Hani Itani, Alejandro Pardo, Shyma Yaser Alhuwaider, Merey Ramazanova, Juan Camilo Perez, Zhipeng Cai, Matthias Müller, Bernard Ghanem

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper proposes a novel online evaluation protocol for Test Time Adaptation (TTA) methods, which penalizes slower methods by providing them with fewer samples for adaptation. We apply our proposed protocol to benchmark several TTA methods on multiple datasets and scenarios. Extensive experiments show that, when accounting for inference speed, simple and fast approaches can outperform more sophisticated but slower methods.
Researcher Affiliation Collaboration 1King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia 2Intel Labs, Munich, Germany.
Pseudocode No The paper describes the protocols (Current Protocol and Realistic Online Evaluation Protocol) using numbered steps (Curr.1, Curr.2, RTTA 1, RTTA 2). However, these are descriptive steps of a process rather than structured pseudocode or an algorithm block labeled as such.
Open Source Code Yes 1Code: github/Motasem Alfarra/Online-Test-Time-Adaptation
Open Datasets Yes In all our experiments, we assume that fθ is a Res Net50-BN3 (He et al., 2016) trained on Image Net (Deng et al., 2009)... We further extend our evaluation and consider CIFAR10-C, Image Net-R (Hendrycks et al., 2021), and the more recent Image Net-3DCC (Kar et al., 2022).
Dataset Splits Yes We report all our main results as the average across three seeds... We evaluate on the Image Net-C dataset (Hendrycks & Dietterich, 2019) with a corruption level of 5 for all 15 corruptions. We further extend our evaluation and consider CIFAR10-C, Image Net-R (Hendrycks et al., 2021), and the more recent Image Net-3DCC (Kar et al., 2022)... Continual evaluation means the corruptions are presented in a sequence without resetting the model in between... starting with brightness and ending with clean validation set.
Hardware Specification No The paper mentions "GPU operations" and "hardware dependence" in the appendix, but it does not specify any particular GPU models, CPU models, or other hardware components used for running the experiments. It only states the work was done at Intel Labs.
Software Dependencies No The paper mentions "torch.cuda.synchronize()" in Appendix A.1, which implies the use of PyTorch and CUDA. However, it does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup Yes We report all our main results as the average across three seeds... We assume that the stream S reveals batches of size 64... Regarding datasets, we... evaluate on the Image Net-C dataset... with a corruption level of 5 for all 15 corruptions... We experiment with the stream speed by setting η ∈ {1/16, 1/8, 1/4, 1/2, 1}... We conduct hyper parameter search for Tent... and experiment with different learning rates (the only hyper-parameter for Tent).