Predicting Out-of-Distribution Error with the Projection Norm

Authors: Yaodong Yu, Zitong Yang, Alexander Wei, Yi Ma, Jacob Steinhardt

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our approach outperforms existing methods on both image and text classification tasks and across different network architectures. ... We evaluate the Proj Norm algorithm on several out-of-distribution datasets in the vision and language domains. We first compare our method with existing methods and demonstrate its effectiveness (Section 3.1). Next, we study the sensitivity of Proj Norm to hyperparameters and data set size (Section 3.2).
Researcher Affiliation Academia 1University of California, Berkeley.
Pseudocode Yes Algorithm 1 provides a detailed description of the Proj Norm algorithm.
Open Source Code Yes Our code is available at https: //github.com/yaodongyu/Proj Norm.
Open Datasets Yes We evaluate each method we consider on the image classification tasks CIFAR10, CIFAR100 (Krizhevsky et al., 2009) and the natural language inference task MNLI (Williams et al., 2017).
Dataset Splits Yes For the CIFAR datasets, we fine-tune using SGD with learning rate 10 3, momentum 0.9, and cosine learning rate decay (Loshchilov & Hutter, 2016). For MNLI, we use Adam W (Loshchilov & Hutter, 2017) with learning rate 2 10 5 and linear learning rate decay. For computing Proj Norm, we apply the same optimizer as fine-tuning on each dataset and use the pre-trained model weights as the initialization θ0. The default number of training iterations for Proj Norm is 1000. ... in-distribution validation samples.
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, or cloud computing instances) used for running the experiments. It only mentions using 'pre-trained models'.
Software Dependencies No The paper mentions optimizers like SGD and AdamW, and specific models (ResNet, VGG, BERT, RoBERTa), but it does not specify version numbers for any software dependencies like Python, PyTorch, TensorFlow, CUDA, or specific libraries.
Experiment Setup Yes For the CIFAR datasets, we fine-tune using SGD with learning rate 10 3, momentum 0.9, and cosine learning rate decay (Loshchilov & Hutter, 2016). For MNLI, we use Adam W (Loshchilov & Hutter, 2017) with learning rate 2 10 5 and linear learning rate decay. For computing Proj Norm, we apply the same optimizer as fine-tuning on each dataset and use the pre-trained model weights as the initialization θ0. The default number of training iterations for Proj Norm is 1000.