Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction
Authors: Been Kim, Julie A. Shah, Finale Doshi-Velez
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | MGM extracts distinguishing features on real-world datasets of animal features, recipes ingredients, and disease co-occurrence. It also maintains or improves performance when compared to related approaches. We perform a user study with domain experts to show the MGM s ability to help with dataset exploration. |
| Researcher Affiliation | Academia | Been Kim Julie Shah Massachusetts Institute of Technology Cambridge, MA 02139 {beenkim, julie a shah}@csail.mit.edu Finale Doshi-Velez Harvard University Cambridge, MA 02138 finale@seas.harvard.edu |
| Pseudocode | No | The paper includes a graphical model (Figure 1) and describes the generative process and inference steps in text and equations, but it does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the methodology described. |
| Open Datasets | Yes | We compared the classification performance of our clustering algorithms on several UCI benchmark problems [23]... The animals data set [24] consists of 21 biological and ecological properties of 101 animals... The recipes data set consists of ingredients from recipes taken from the computer cooking contest1. (footnote: Computer Cooking Contest: http://liris.cnrs.fr/ccc/ccc2014/doku.php)... Finally, we consider a data set of patients with autism spectrum disorder (ASD) accumulated over the first 15 years of life [25]. |
| Dataset Splits | No | The paper mentions sweeping over the number of clusters (K) and reporting results with the highest ELBO, which relates to model selection, but it does not specify explicit training, validation, or test dataset splits or percentages for data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their respective versions) used for implementation or experimentation. |
| Experiment Setup | Yes | In all cases, we ran 5 restarts of the MGM. Inference was run for 40 iterations or until the ELBO improved by less than 0.1 relative to the previous iteration. Twenty possible merges were explored in each iteration; each merge exploration involved combining two existing groups into a new group... We swept over the number of clusters from K=4 to K=16 and reported the results with the highest ELBO. |