Constraining Representations Yields Models That Know What They Don't Know
Authors: Joao Monteiro, Pau Rodriguez, Pierre-Andre Noel, Issam H. Laradji, David Vazquez
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide further empirical evidence that TAC works well on multiple types of architectures and data modalities and that it is at least as good as state-of-the-art alternative confidence scores derived from existing models. [...] Evaluations are split into three main parts: Section 3.1: We start with a proof-of-concept and show that TAC can match activation patterns defined by class codes. We further show that small norm attackers are not able to match codes as well as clean data, rendering the distance between activation profiles and codes a good confidence score. Section 3.2: We then proceed to the main evaluation and use TAC as an add-on to existing classifiers. In this case, we evaluate performance under the rejection setting and show TAC to improve upon the base classifier. We further evaluate TAC when used to detect test data from unseen classes. Section 3.3: We seek additional applications of TAC and put it to test as a robust surrogate to the base classifier. |
| Researcher Affiliation | Industry | Jo ao Monteiro, Pau Rodr ıguez , Pierre-Andr e No el, Issam Laradji, David V azquez Service Now Research {First Name.Last Name}@servicenow.com Currently at Apple. |
| Pseudocode | Yes | Figure 16 in the Appendix shows a Pytorch (Paszke et al., 2019) implementation of feature slicing and activation profile computation. [...] We present Python code snippets in Figure 15 showing that our choice of threshold used to compute the detection rate matches that of the standard Equal Error Rate. [...] In Figure 16, we show an example of an implementation of TAC s slice and reduce operations on top of 2-dimensional features. [...] Figure 17: Pytorch implementation of Mixup interpolations. |
| Open Source Code | No | The paper mentions 'code snippets of critical components are displayed in Figures 16 and 17' in the reproducibility statement, but it does not state that the full source code for the methodology is openly available or provide a link to a repository. |
| Open Datasets | Yes | To test for whether commonly used models are able to match activation patterns given by class codes, we train a TAC ed Wide Res Net-28-10 (Madry et al., 2017) on CIFAR-10 (Krizhevsky et al., 2009)... [...] We consider intent prediction tasks of Dialo GLUE (Mehri et al., 2020). Namely, we conduct experiments on HWU64 (Liu et al., 2021), Banking77 (Casanueva et al., 2020), and CLINC150 (Larson et al., 2019)... [...] We pre-trained a Vi T Base-16x16 (Dosovitskiy et al., 2020) as the base predictor, and TAC operations are performed in 13 different layers throughout the model. We perform k-fold (k = 5) random splits on the validation set of Image Net and, for a given split and value of ω, we then use the k 1 left-out splits to select the confidence rejection threshold that maximizes V. |
| Dataset Splits | Yes | We perform k-fold (k = 5) random splits on the validation set of Image Net and, for a given split and value of ω, we then use the k 1 left-out splits to select the confidence rejection threshold that maximizes V. Curves averaged over splits are plotted for the data used for threshold selection (indicated as train in the plot) as well as for the left-out splits. |
| Hardware Specification | No | The paper states 'both training and evaluation across all applications we considered were performed in single-GPU hardware.' and 'it takes only a couple of hours to train TAC on a single GPU.' However, no specific GPU model (e.g., NVIDIA A100, RTX 3090, etc.) or any other detailed hardware specifications are provided. |
| Software Dependencies | No | The paper mentions 'Pytorch (Paszke et al., 2019)', 'Foolbox', and 'Torchvision' as tools used. While PyTorch is cited, specific version numbers for any of these software dependencies (e.g., 'Pytorch 1.x' or 'Foolbox vX.Y') are not explicitly provided in the text, which is required for reproducibility. |
| Experiment Setup | Yes | Training was performed with Adam in all cases except for models trained on MNIST and CIFAR-10, where SGD with momentum was employed. [...] Overall, we noticed that TAC tends to perform better when weight decay is not applied or when its coefficient is set to very small values (< 10 5). [...] The perturbation budget given to attackers in each case was 0.05, 0.02, and 0.1 for FGSM, PGD, and CW respectively. [...] We considered 5 projection configurations named small, large, very-large, x-large, and 2x-large, and the choice amongst those options is treated as a hyperparameter to be selected with cross-validation for each dataset we trained on. The numbers of fully connected layers for each configuration is 1, 2, 3, 33, and 5. |