Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DORA: Exploring Outlier Representations in Deep Neural Networks
Authors: Kirill Bykov, Mayukh Deb, Dennis Grinwald, Klaus Robert Muller, Marina MC Höhne
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the EA metric quantitatively, demonstrating its effectiveness both in controlled scenarios and real-world applications. Lastly, through practical experiments conducted on popular Computer Vision models, we reveal that anomalous representations identified by our framework often correspond to undesirable spurious concepts. To quantitatively evaluate the alignment, we compared human-defined semantic distances between concepts, which we refer to as semantic baselines, with distance matrices computed between representations trained to learn these concepts. |
| Researcher Affiliation | Collaboration | Klaus-Robert Müller EMAIL Machine Learning Group Technical University of Berlin, Berlin, Germany BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea Max Planck Institut für Informatik, 66123 Saarbrücken, Germany Google Research, Brain Team, Berlin, Germany |
| Pseudocode | No | The paper includes mathematical definitions and formulas but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Py Torch implementation of the proposed method can be found by the following link: https://github.com/lapalap/dora . |
| Open Datasets | Yes | For our study, we utilized two prevalent computer vision datasets, namely ILSVRC-2012 Russakovsky et al. (2015) and CIFAR-100 Krizhevsky (2009). The combined dataset comprised the Tiny Imagenet Le and Yang (2015), containing 200 Image Net classes, and the MNIST handwritten-numbers dataset Deng (2012), containing 10 handwritten numbers, resulting in a total of 210 classes. |
| Dataset Splits | Yes | The data set itself consists of 224,316 training, 200 validation, and 500 test data points. For our empirical analysis, we utilized a pre-trained Res Net18 model on Image Net, along with the ILSVRC-2012 validation set consisting of 50,000 images and 1,000 classes, employed for the data-aware metrics. |
| Hardware Specification | No | All described experiments, if not stated otherwise, were performed on the Google Colab Pro Bisong and Bisong (2019) environment with the GPU accelerator. This statement is too general and does not specify exact GPU models, CPU models, or memory details. |
| Software Dependencies | No | The paper mentions software components like "Py Torch implementation", "NLTK package", "Torchvision library", "pytorch-vision-models library", "Pytorch-cifar100 Git Hub repository", and "Lucent library" but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We computed functional distances with optimal hyperparameters found in Section 5.1, including Minkowski p = 1, Pearson, Spearman, EAn with n = 50, d = 200, and EAs with n = 3, m = 500, on the output logit layer for each model. |