Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction

Authors: Been Kim, Julie A. Shah, Finale Doshi-Velez

NeurIPS 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	MGM extracts distinguishing features on real-world datasets of animal features, recipes ingredients, and disease co-occurrence. It also maintains or improves performance when compared to related approaches. We perform a user study with domain experts to show the MGM s ability to help with dataset exploration.
Researcher Affiliation	Academia	Been Kim Julie Shah Massachusetts Institute of Technology Cambridge, MA 02139 EMAIL Finale Doshi-Velez Harvard University Cambridge, MA 02138 EMAIL
Pseudocode	No	The paper includes a graphical model (Figure 1) and describes the generative process and inference steps in text and equations, but it does not contain a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any statement or link indicating the availability of open-source code for the methodology described.
Open Datasets	Yes	We compared the classiﬁcation performance of our clustering algorithms on several UCI benchmark problems [23]... The animals data set [24] consists of 21 biological and ecological properties of 101 animals... The recipes data set consists of ingredients from recipes taken from the computer cooking contest1. (footnote: Computer Cooking Contest: http://liris.cnrs.fr/ccc/ccc2014/doku.php)... Finally, we consider a data set of patients with autism spectrum disorder (ASD) accumulated over the ﬁrst 15 years of life [25].
Dataset Splits	No	The paper mentions sweeping over the number of clusters (K) and reporting results with the highest ELBO, which relates to model selection, but it does not specify explicit training, validation, or test dataset splits or percentages for data partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their respective versions) used for implementation or experimentation.
Experiment Setup	Yes	In all cases, we ran 5 restarts of the MGM. Inference was run for 40 iterations or until the ELBO improved by less than 0.1 relative to the previous iteration. Twenty possible merges were explored in each iteration; each merge exploration involved combining two existing groups into a new group... We swept over the number of clusters from K=4 to K=16 and reported the results with the highest ELBO.