Agreement-on-the-line: Predicting the Performance of Neural Networks under Distribution Shift
Authors: Christina Baek, Yiding Jiang, Aditi Raghunathan, J. Zico Kolter
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the ID vs. OOD accuracy and agreement between pairs of models across more than 20 common OOD benchmarks and hundreds of independently trained neural networks. We present results on 8 dataset shifts in the main paper, and include results for other distribution shifts in the Appendix C. In Table 2, we observe that ALine-D generally outperforms other methods on datasets where agreement-on-the-line holds. |
| Researcher Affiliation | Collaboration | Christina Baek1 Yiding Jiang1 Aditi Raghunathan1 Zico Kolter1,2 1Carnegie Mellon University, 2Bosch Center for AI |
| Pseudocode | Yes | Algorithm 1 ALine-D: Predicting OOD Accuracy |
| Open Source Code | Yes | Implementation of our method is available at https://github.com/kebaek/ agreement-on-the-line. |
| Open Datasets | Yes | Datasets. We present results on 8 dataset shifts in the main paper, and include results for other distribution shifts in the Appendix C. These 8 datasets span: 1. Dataset reproductions: CIFAR-10.1 [67], CIFAR-10.2 [52] reproductions of CIFAR-10 [43] and Image Net V2 [67] reproduction of Image Net [21] |
| Dataset Splits | No | The paper mentions using a 'labeled validation set' but does not specify the exact size, percentages, or split methodology for these validation sets across the various datasets, making it difficult to reproduce the data partitioning precisely. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU model, CPU type, memory) used to run the experiments. It only mentions 'hundreds of independently trained neural networks' and models from a 'testbed'. |
| Software Dependencies | No | The paper mentions evaluating models from the 'timm [82] package' and references 'Pytorch image models' but does not provide specific version numbers for these or other software dependencies like PyTorch itself, Python, or CUDA. |
| Experiment Setup | No | The paper mentions 'probit scaling' and 'temperature scaling' as data transformations or model adjustments, but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer details) for training the models used in the experiments. |