Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Representational Difference Explanations

Authors: Neehar Kondapaneni, Oisin Mac Aodha, Pietro Perona

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method, which we call Representational Differences Explanations (RDX), by using it to compare models with known conceptual differences and demonstrate that it recovers meaningful distinctions where existing explainable AI (XAI) techniques fail. Applied to state-of-the-art models on challenging subsets of the Image Net and i Naturalist datasets, RDX reveals both insightful representational differences and subtle patterns in the data.
Researcher Affiliation	Academia	Neehar Kondapaneni1 Oisin Mac Aodha2 Pietro Perona1 1Caltech 2University of Edinburgh
Pseudocode	Yes	Algorithm 1: Evaluation of Explanations on Representation A. ... Algorithm 1 Selecting Image and Neighbors with Maximum KNA
Open Source Code	No	Code will be released upon acceptance. ... Code will be published provided the work is accepted.
Open Datasets	Yes	Applied to state-of-the-art models on challenging subsets of the Image Net and i Naturalist datasets... modified MNIST dataset... CUB dataset [68]
Dataset Splits	Yes	We randomly sample 70% of the embeddings in our dataset to train the transformation matrix. The other 30% are used as a validation set. ... For all experiments, we use images from the train split because the train split is usually larger. ... For each base dataset, we compare clusterings across the three in-distribution variants using the pairs: (base, -20%), (base, +20%), and (-20%, +20%).
Hardware Specification	Yes	All experiments were conducted using on a machine with an AMD Ryzen 7 3700X 8-Core Processor and a single Ge Force RTX 4090 GPU with 128GB of RAM.
Software Dependencies	No	We use the Adam [27] optimizer... pytorchcv [60]... scikit-learn [47]... pymf [63] implementation... Overcomplete repository [25]... HDBSCAN [39].
Experiment Setup	Yes	We use the Adam [27] optimizer with the learning rate set to 1e-2 and a one-cycle learning rate schedule. The global seed is set to 4834586. ... The learning rate is set to 1e-3 and the model is trained for a maximum of 10000 iterations with a batch size of 64. ... The sparsity coefficient is set to 0.0004. ... We sweep γ on one comparison from each experiment group ... and select the value that results in the highest performance on BSR. We find that a γ of 0.05 or 0.1 works well. We set β to 5 in all experiments.