HYPO: Hyperspherical Out-Of-Distribution Generalization

Authors: Haoyue Bai, Yifei Ming, Julian Katz-Samuels, Yixuan Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance.
Researcher Affiliation Collaboration Haoyue Bai1 , Yifei Ming1 , Julian Katz-Samuels2 , Yixuan Li1 Department of Computer Sciences, University of Wisconsin-Madison1 Amazon2
Pseudocode Yes An end-to-end pseudo algorithm is summarized in Appendix A.
Open Source Code Yes Code is available at https://github.com/deeplearning-wisc/hypo.
Open Datasets Yes Datasets. Following the common benchmarks in literature, we use CIFAR-10 (Krizhevsky et al., 2009) as the in-distribution data. We use CIFAR-10-C (Hendrycks & Dietterich, 2019) as OOD data... In addition to CIFAR-10, we conduct experiments on popular benchmarks including PACS (Li et al., 2017), Office-Home (Gulrajani & Lopez-Paz, 2020), and VLCS (Gulrajani & Lopez-Paz, 2020)... Results on additional OOD datasets Terra Incognita (Gulrajani & Lopez-Paz, 2020), and Image Net can be found in Appendix F and Appendix G.
Dataset Splits Yes We adopt the leave-one-domain-out evaluation protocol and use the training domain validation set for model selection (Gulrajani & Lopez-Paz, 2020), where the validation set is pooled from all training domains.
Hardware Specification Yes All experiments are conducted on NVIDIA Ge Force RTX 2080 Ti GPUs for small to medium batch sizes and NVIDIA A100 and RTX A6000 GPUs for large batch sizes.
Software Dependencies Yes Our method is implemented with Py Torch 1.10.
Experiment Setup Yes For CIFAR-10, we train the model from scratch for 500 epochs using an initial learning rate of 0.5 and cosine scheduling, with a batch size of 512. ... We set the default temperature τ as 0.1 and the prototype update factor α as 0.95. ... The search distribution in our experiments for the learning rate hyperparameter is: lr {0.005, 0.002, 0.001, 0.0005, 0.0002, 0.0001, 0.00005}. The search space for the batch size is bs {32, 64}. The loss weight λ for balancing our loss function (L = λLvar + Lsep) is selected from λ {1.0, 2.0, 4.0}.