reproducibilityindex.ai

Towards Explaining Distribution Shifts

Authors: Sean Kulinski, David I. Inouye

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In section 5, we show empirical results on real-world tabular, text, and image-based datasets demonstrating how our explanations can aid an operator in understanding how a distribution has shifted.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA.
Pseudocode	Yes	Algorithm 1 Finding k-Sparse Maps; Algorithm 2 Solving for k-Cluster Mappings
Open Source Code	Yes	Code to recreate the experiments can be found at https://github.com/inouye-lab/explaining-distribution-shifts.
Open Datasets	Yes	US Census Adult Income dataset (Kohavi & Becker, 1996); Civil Comments Dataset (Borkan et al., 2019); WILDS Camelyon17 dataset (Bandi et al., 2018); UCI Breast Cancer Wisconsin (Original) dataset (Mangasarian & Wolberg, 1990); MNIST digits (Deng, 2012); Celeb A dataset (Liu et al., 2015)
Dataset Splits	No	The paper describes how datasets were used for training and analysis (e.g., 'We trained DIVA on the Shifted Multi-MNIST dataset for 600 epochs' and 'The SSVAE was trained for 200 epochs...with 80% of the labels available'), but it does not specify explicit validation splits (e.g., percentages or counts for a validation set) for model tuning or hyperparameter selection separate from the training data.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as particular GPU or CPU models, memory specifications, or cloud computing instance types.
Software Dependencies	No	The paper mentions implementing methods and training models but does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup	Yes	We trained DIVA on the Shifted Multi-MNIST dataset for 600 epochs with a KL-β value of 10 and latent dimension of 64 for each of the three sub-spaces.; The SSVAE was trained for 200 epochs on a concatenation of both Psrc and Ptgt with 80% of the labels available per environment, and a batch size of 128