Towards Robust Metrics for Concept Representation Evaluation

Authors: Mateo Espinosa Zarlenga, Pietro Barbiero, Zohreh Shams, Dmitry Kazhdan, Umang Bhatt, Adrian Weller, Mateja Jamnik

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now give a brief account of the experimental setup and datasets, followed by highlighting the utility of our impurity metrics and their applications to model benchmarking. ... We compare the purity of concept representations in various methods using our metrics. ... Our results show that our metrics correctly capture the difference in impurity between the two representation sets in a statistically significant manner.
Researcher Affiliation Collaboration 1 University of Cambridge 2 Babylon Health 3 The Alan Turing Institute
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Appendices can be found in https://arxiv.org/abs/2301.10367. This link points to the paper's arXiv page, which does not explicitly state the availability of source code for the described methodology.
Open Datasets Yes We construct tasks whose samples are fully described by a vector of ground truth generative factors. ... We first design a parametric binary-class dataset Tabular Toy(δ), a variation of the tabular dataset proposed by Mahinpei et al. (2021). We also construct two multiclass image-based parametric datasets: d Sprites(λ) and 3dshapes(λ), based on d Sprites (Matthey et al. 2017) and 3dshapes (Burgess and Kim 2018) datasets, respectively. ... CUB dataset (Wah et al. 2011)
Dataset Splits No The paper mentions 'Val Concept Val Task Test Concept Test Task' in Figure 3 and 'task validation accuracy', implying validation sets were used. However, it does not explicitly provide specific percentages, sample counts, or clear references to predefined splits for training, validation, and test sets needed for reproduction in the main text.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not list specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes We include details on training and architecture hyperparameters in Appendix A.6.