Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

Authors: Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, Moritz Hardt

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally validate our method in the context of object recognition on several standard benchmarks. These include images with diverse types of corruption at various levels (Hendrycks & Dietterich, 2019), video frames of moving objects (Shankar et al., 2019), and a new test set of unknown shifts collected by (Recht et al., 2018). Our algorithm makes substantial improvements under distribution shifts, while maintaining the same performance on the original distribution.
Researcher Affiliation Academia 1University of California, Berkeley 2University of California, San Diego 3MH is a paid consultant for Twitter. Correspondence to: Yu Sun <yusun@berkeley.edu>.
Pseudocode No The paper contains mathematical equations describing the model and optimization problems but no structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Project website: https://test-time-training.github.io/.
Open Datasets Yes We use Res Nets (He et al., 2016b), which are constructed differently for CIFAR-10 (Krizhevsky & Hinton, 2009) and Image Net (Russakovsky et al., 2015).
Dataset Splits No For CIFAR-10, the paper states '50K images for training, and 10K images for testing,' lacking an explicit validation split. For ImageNet, it notes '1.2M images for training and the 50K validation images are used as the test set,' which combines validation and test rather than providing a distinct validation set for hyperparameter tuning.
Hardware Specification No The paper does not specify any particular CPU or GPU models, or other hardware specifications used for running the experiments, only mentioning 'Res Nets' and 'Group Normalization'.
Software Dependencies No The paper mentions general software components like 'stochastic gradient descent' and 'Group Normalization' but does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or other software dependencies.
Experiment Setup Yes For Test-Time Training (Equation 3), we use stochastic gradient descent with the learning rate set to that of the last epoch during training, which is 0.001 in all our experiments. We set weight decay and momentum to zero during Test-Time Training... For the standard version of Test-Time Training, we take ten gradient steps... For online version of Test-Time Training, we take only one gradient step... We use random crop and random horizontal flip for data augmentation.