Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DORA: Exploring Outlier Representations in Deep Neural Networks

Authors: Kirill Bykov, Mayukh Deb, Dennis Grinwald, Klaus Robert Muller, Marina MC Höhne

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the EA metric quantitatively, demonstrating its effectiveness both in controlled scenarios and real-world applications. Lastly, through practical experiments conducted on popular Computer Vision models, we reveal that anomalous representations identified by our framework often correspond to undesirable spurious concepts. To quantitatively evaluate the alignment, we compared human-defined semantic distances between concepts, which we refer to as semantic baselines, with distance matrices computed between representations trained to learn these concepts.
Researcher Affiliation Collaboration Klaus-Robert Müller EMAIL Machine Learning Group Technical University of Berlin, Berlin, Germany BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea Max Planck Institut für Informatik, 66123 Saarbrücken, Germany Google Research, Brain Team, Berlin, Germany
Pseudocode No The paper includes mathematical definitions and formulas but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Py Torch implementation of the proposed method can be found by the following link: https://github.com/lapalap/dora .
Open Datasets Yes For our study, we utilized two prevalent computer vision datasets, namely ILSVRC-2012 Russakovsky et al. (2015) and CIFAR-100 Krizhevsky (2009). The combined dataset comprised the Tiny Imagenet Le and Yang (2015), containing 200 Image Net classes, and the MNIST handwritten-numbers dataset Deng (2012), containing 10 handwritten numbers, resulting in a total of 210 classes.
Dataset Splits Yes The data set itself consists of 224,316 training, 200 validation, and 500 test data points. For our empirical analysis, we utilized a pre-trained Res Net18 model on Image Net, along with the ILSVRC-2012 validation set consisting of 50,000 images and 1,000 classes, employed for the data-aware metrics.
Hardware Specification No All described experiments, if not stated otherwise, were performed on the Google Colab Pro Bisong and Bisong (2019) environment with the GPU accelerator. This statement is too general and does not specify exact GPU models, CPU models, or memory details.
Software Dependencies No The paper mentions software components like "Py Torch implementation", "NLTK package", "Torchvision library", "pytorch-vision-models library", "Pytorch-cifar100 Git Hub repository", and "Lucent library" but does not provide specific version numbers for any of them.
Experiment Setup Yes We computed functional distances with optimal hyperparameters found in Section 5.1, including Minkowski p = 1, Pearson, Spearman, EAn with n = 50, d = 200, and EAs with n = 3, m = 500, on the output logit layer for each model.