reproducibilityindex.ai

Towards Robust Metrics for Concept Representation Evaluation

Authors: Mateo Espinosa Zarlenga, Pietro Barbiero, Zohreh Shams, Dmitry Kazhdan, Umang Bhatt, Adrian Weller, Mateja Jamnik

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now give a brief account of the experimental setup and datasets, followed by highlighting the utility of our impurity metrics and their applications to model benchmarking. ... We compare the purity of concept representations in various methods using our metrics. ... Our results show that our metrics correctly capture the difference in impurity between the two representation sets in a statistically significant manner.
Researcher Affiliation	Collaboration	1 University of Cambridge 2 Babylon Health 3 The Alan Turing Institute
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	Appendices can be found in https://arxiv.org/abs/2301.10367. This link points to the paper's arXiv page, which does not explicitly state the availability of source code for the described methodology.
Open Datasets	Yes	We construct tasks whose samples are fully described by a vector of ground truth generative factors. ... We first design a parametric binary-class dataset Tabular Toy(δ), a variation of the tabular dataset proposed by Mahinpei et al. (2021). We also construct two multiclass image-based parametric datasets: d Sprites(λ) and 3dshapes(λ), based on d Sprites (Matthey et al. 2017) and 3dshapes (Burgess and Kim 2018) datasets, respectively. ... CUB dataset (Wah et al. 2011)
Dataset Splits	No	The paper mentions 'Val Concept Val Task Test Concept Test Task' in Figure 3 and 'task validation accuracy', implying validation sets were used. However, it does not explicitly provide specific percentages, sample counts, or clear references to predefined splits for training, validation, and test sets needed for reproduction in the main text.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper does not list specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	We include details on training and architecture hyperparameters in Appendix A.6.