Explaining Latent Representations with a Corpus of Examples

Authors: Jonathan Crabbe, Zhaozhi Qian, Fergus Imrie, Mihaela van der Schaar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on tasks ranging from mortality prediction to image classification, we demonstrate that these decompositions are robust and accurate.
Researcher Affiliation Academia Jonathan Crabbé University of Cambridge jc2133@cam.ac.uk Zhaozhi Qian University of Cambridge zq224@maths.cam.ac.uk Fergus Imrie UCLA imrie@g.ucla.edu Mihaela van der Schaar University of Cambridge The Alan Turing Institute UCLA mv472@cam.ac.uk
Pseudocode No No pseudocode or algorithm blocks are present in the paper. The methodology is described in prose and mathematical formulations.
Open Source Code Yes The code for our method and experiments is available on the Github repository https://github.com/Jonathan Crabbe/Simplex.
Open Datasets Yes We use two different datasets with distinct tasks for our experiment: (1) 240,486 patients enrolled in the American SEER program [31]. We consider the binary classification task of predicting cancer mortality for patients with prostate cancer. (2) 70,000 MNIST images of handwritten digits [32].
Dataset Splits No We start with a dataset D that we split into a training set Dtrain and a testing set Dtest. We train and validate an MLP risk model with DUSA.
Hardware Specification No All the experiments have been replicated on different machines.
Software Dependencies No The paper mentions training a multilayer perceptron (MLP) and a convolutional neural network (CNN), implying software frameworks, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper describes general experimental settings such as using K corpus examples and adding an L1 penalty, but does not provide specific hyperparameter values like learning rates, batch sizes, number of epochs, or optimizer settings for the trained MLP or CNN models.