HYPO: Hyperspherical Out-Of-Distribution Generalization
Authors: Haoyue Bai, Yifei Ming, Julian Katz-Samuels, Yixuan Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. |
| Researcher Affiliation | Collaboration | Haoyue Bai1 , Yifei Ming1 , Julian Katz-Samuels2 , Yixuan Li1 Department of Computer Sciences, University of Wisconsin-Madison1 Amazon2 |
| Pseudocode | Yes | An end-to-end pseudo algorithm is summarized in Appendix A. |
| Open Source Code | Yes | Code is available at https://github.com/deeplearning-wisc/hypo. |
| Open Datasets | Yes | Datasets. Following the common benchmarks in literature, we use CIFAR-10 (Krizhevsky et al., 2009) as the in-distribution data. We use CIFAR-10-C (Hendrycks & Dietterich, 2019) as OOD data... In addition to CIFAR-10, we conduct experiments on popular benchmarks including PACS (Li et al., 2017), Office-Home (Gulrajani & Lopez-Paz, 2020), and VLCS (Gulrajani & Lopez-Paz, 2020)... Results on additional OOD datasets Terra Incognita (Gulrajani & Lopez-Paz, 2020), and Image Net can be found in Appendix F and Appendix G. |
| Dataset Splits | Yes | We adopt the leave-one-domain-out evaluation protocol and use the training domain validation set for model selection (Gulrajani & Lopez-Paz, 2020), where the validation set is pooled from all training domains. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA Ge Force RTX 2080 Ti GPUs for small to medium batch sizes and NVIDIA A100 and RTX A6000 GPUs for large batch sizes. |
| Software Dependencies | Yes | Our method is implemented with Py Torch 1.10. |
| Experiment Setup | Yes | For CIFAR-10, we train the model from scratch for 500 epochs using an initial learning rate of 0.5 and cosine scheduling, with a batch size of 512. ... We set the default temperature τ as 0.1 and the prototype update factor α as 0.95. ... The search distribution in our experiments for the learning rate hyperparameter is: lr {0.005, 0.002, 0.001, 0.0005, 0.0002, 0.0001, 0.00005}. The search space for the batch size is bs {32, 64}. The loss weight λ for balancing our loss function (L = λLvar + Lsep) is selected from λ {1.0, 2.0, 4.0}. |