Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement
Authors: Ailin Deng, Miao Xiong, Bryan Hooi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Theoretical analysis and extensive experiments on failure detection across various datasets verify the effectiveness of our method on both in-distribution and out-of-distribution settings. We conduct extensive experiments on failure detection to verify the benefits of our framework to improve model reliability and provide theoretical justification for our method. |
| Researcher Affiliation | Academia | 1School of Computing, National University of Singapore, Singapore 2Institute of Data Science, National University of Singapore, Singapore. Correspondence to: Ailin Deng <ailin@u.nus.edu>. |
| Pseudocode | Yes | We summarize our framework in Algorithm 1 in Appendix. Algorithm 1 Inter-model Latent Agreement |
| Open Source Code | Yes | Our code is available via https://github.com/ d-ailin/latent-agreement |
| Open Datasets | Yes | We run experiments on six in-distribution datasets and five distribution shifts to evaluate the failure detection performance. For in-distribution, we use CIFAR10 (Krizhevsky et al.), CIFAR100, STL (Coates et al., 2011), BIRDS (Wah et al., 2011), FOOD (Bossard et al., 2014) and a large-scale dataset, Image Net (Image Net-1K) (Deng et al., 2009). |
| Dataset Splits | Yes | Table 3. Number of images per data set and associated splits Datasets Classes Train Size Val. Size Test Size Unlabeled Set Size CIFAR10 10 50000 1000 9000 CIFAR100 100 50000 1000 9000 BIRDS 200 5994 2897 2897 STL 10 5000 4000 4000 100000 FOOD 102 75750 12625 12625 Image Net 1000 1281167 10000 40000 - |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using PyTorch Image Models but does not provide specific version numbers for PyTorch, Python, or other software dependencies required for reproducibility. |
| Experiment Setup | Yes | Section 4.1. Experimental Setup; Section A.2. Training Receipt: For Res Net-50 models, we fine-tune with Adam optimizer with learning rate 1e 4 and (β1, β2) = (0.9, 0.99). For Vi T, we fine-tuned with cosine annealing scheduler. The detail is shown in Table 4. Table 4. Training parameters per data set for Vi T. init-lr: Initial learning rate of the cosine annealing scheduler as selected. steps: Number of batches that was trained on. Section A.4. Hyperparameters: We have training set size n and neighborhood size k as hyperparameters. For main results, except for the ablation study, we use n = 10000 across all datasets... We select k {10, 20, 50, 100, 200, 500, 1000} with optimal AUROC performance on validation split for each dataset. |