Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Representational Difference Explanations
Authors: Neehar Kondapaneni, Oisin Mac Aodha, Pietro Perona
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method, which we call Representational Differences Explanations (RDX), by using it to compare models with known conceptual differences and demonstrate that it recovers meaningful distinctions where existing explainable AI (XAI) techniques fail. Applied to state-of-the-art models on challenging subsets of the Image Net and i Naturalist datasets, RDX reveals both insightful representational differences and subtle patterns in the data. |
| Researcher Affiliation | Academia | Neehar Kondapaneni1 Oisin Mac Aodha2 Pietro Perona1 1Caltech 2University of Edinburgh |
| Pseudocode | Yes | Algorithm 1: Evaluation of Explanations on Representation A. ... Algorithm 1 Selecting Image and Neighbors with Maximum KNA |
| Open Source Code | No | Code will be released upon acceptance. ... Code will be published provided the work is accepted. |
| Open Datasets | Yes | Applied to state-of-the-art models on challenging subsets of the Image Net and i Naturalist datasets... modified MNIST dataset... CUB dataset [68] |
| Dataset Splits | Yes | We randomly sample 70% of the embeddings in our dataset to train the transformation matrix. The other 30% are used as a validation set. ... For all experiments, we use images from the train split because the train split is usually larger. ... For each base dataset, we compare clusterings across the three in-distribution variants using the pairs: (base, -20%), (base, +20%), and (-20%, +20%). |
| Hardware Specification | Yes | All experiments were conducted using on a machine with an AMD Ryzen 7 3700X 8-Core Processor and a single Ge Force RTX 4090 GPU with 128GB of RAM. |
| Software Dependencies | No | We use the Adam [27] optimizer... pytorchcv [60]... scikit-learn [47]... pymf [63] implementation... Overcomplete repository [25]... HDBSCAN [39]. |
| Experiment Setup | Yes | We use the Adam [27] optimizer with the learning rate set to 1e-2 and a one-cycle learning rate schedule. The global seed is set to 4834586. ... The learning rate is set to 1e-3 and the model is trained for a maximum of 10000 iterations with a batch size of 64. ... The sparsity coefficient is set to 0.0004. ... We sweep γ on one comparison from each experiment group ... and select the value that results in the highest performance on BSR. We find that a γ of 0.05 or 0.1 works well. We set β to 5 in all experiments. |