Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Estimating Conditional Mutual Information for Dynamic Feature Selection

Authors: Soham Gadgil, Ian Connick Covert, Su-In Lee

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our method provides consistent gains over recent methods across a variety of datasets.
Researcher Affiliation Academia Soham Gadgil , Ian Covert , Su-In Lee Paul G. Allen School of Computer Science & Engineering, University of Washington
Pseudocode Yes Pseudocode for DIME s training algorithm along with details about our training implementation are provided in Appendix C. Algorithm 1 summarizes our learning approach, where we jointly train the predictor and value networks according to the objectives in eq. (2) and eq. (3). ... Next, Algorithm 2 shows how features are selected at inference time.
Open Source Code Yes Code is available at https://github.com/suinleelab/DIME
Open Datasets Yes MNIST. This is the standard digit classification dataset (Le Cun et al., 1998). ... Imagenette and Image Net-100. These are both subsets of the standard Image Net dataset (Deng et al., 2009). ... MHIST. The MHIST (minimalist histopathology) (Wei et al., 2021) dataset comprises 3,152 hematoxylin and eosin (H&E)-stained Formalin Fixed Paraffin-Embedded (FFPE) fixed-size images of colorectal polyps from patients at Dartmouth-Hitchcock Medical Center (DHMC). The dataset can be accessed by filling out the form at https://bmirds.github.io/MHIST/.
Dataset Splits Yes MNIST... We downloaded it with Py Torch and used the standard train and test splits, with 10,000 training samples held out as a validation set. ... Imagenette and Image Net-100... in both cases we split the images to obtain train, validation and test splits. ... ROSMAP... To avoid overlap between the training, validation, or testing sets, we ensured that all samples from a single individual fell into only one of the data splits.
Hardware Specification Yes The networks are trained on a NVIDIA RTX 2080 Ti GPU with 12GB of memory. ... the networks are trained on a NVIDIA Quadro RTX 6000 GPU with 24GB of memory.
Software Dependencies No We implemented it in Py Torch (Paszke et al., 2017) using Py Torch Lightning. The paper mentions PyTorch and PyTorch Lightning but does not specify their version numbers.
Experiment Setup Yes For all the tabular datasets, we use multilayer perceptrons (MLPs) with two hidden layers and Re LU non-linearity. We use 128 neurons in the hidden layers for the ROSMAP and Intubation datasets, and 512 neurons for MNIST. The initial learning rate is set to 10 3 at the start and we also use dropout with probability 0.3 in all layers to reduce overfitting (Srivastava et al., 2014).