Estimating Conditional Mutual Information for Dynamic Feature Selection

Authors: Soham Gadgil, Ian Connick Covert, Su-In Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our method provides consistent gains over recent methods across a variety of datasets.
Researcher Affiliation Academia Soham Gadgil , Ian Covert , Su-In Lee Paul G. Allen School of Computer Science & Engineering, University of Washington
Pseudocode Yes Pseudocode for DIME s training algorithm along with details about our training implementation are provided in Appendix C. Algorithm 1 summarizes our learning approach, where we jointly train the predictor and value networks according to the objectives in eq. (2) and eq. (3). ... Next, Algorithm 2 shows how features are selected at inference time.
Open Source Code Yes Code is available at https://github.com/suinleelab/DIME
Open Datasets Yes MNIST. This is the standard digit classification dataset (Le Cun et al., 1998). ... Imagenette and Image Net-100. These are both subsets of the standard Image Net dataset (Deng et al., 2009). ... MHIST. The MHIST (minimalist histopathology) (Wei et al., 2021) dataset comprises 3,152 hematoxylin and eosin (H&E)-stained Formalin Fixed Paraffin-Embedded (FFPE) fixed-size images of colorectal polyps from patients at Dartmouth-Hitchcock Medical Center (DHMC). The dataset can be accessed by filling out the form at https://bmirds.github.io/MHIST/.
Dataset Splits Yes MNIST... We downloaded it with Py Torch and used the standard train and test splits, with 10,000 training samples held out as a validation set. ... Imagenette and Image Net-100... in both cases we split the images to obtain train, validation and test splits. ... ROSMAP... To avoid overlap between the training, validation, or testing sets, we ensured that all samples from a single individual fell into only one of the data splits.
Hardware Specification Yes The networks are trained on a NVIDIA RTX 2080 Ti GPU with 12GB of memory. ... the networks are trained on a NVIDIA Quadro RTX 6000 GPU with 24GB of memory.
Software Dependencies No We implemented it in Py Torch (Paszke et al., 2017) using Py Torch Lightning. The paper mentions PyTorch and PyTorch Lightning but does not specify their version numbers.
Experiment Setup Yes For all the tabular datasets, we use multilayer perceptrons (MLPs) with two hidden layers and Re LU non-linearity. We use 128 neurons in the hidden layers for the ROSMAP and Intubation datasets, and 512 neurons for MNIST. The initial learning rate is set to 10 3 at the start and we also use dropout with probability 0.3 in all layers to reduce overfitting (Srivastava et al., 2014).