Towards Explaining Distribution Shifts

Authors: Sean Kulinski, David I. Inouye

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In section 5, we show empirical results on real-world tabular, text, and image-based datasets demonstrating how our explanations can aid an operator in understanding how a distribution has shifted.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA.
Pseudocode Yes Algorithm 1 Finding k-Sparse Maps; Algorithm 2 Solving for k-Cluster Mappings
Open Source Code Yes Code to recreate the experiments can be found at https://github.com/inouye-lab/explaining-distribution-shifts.
Open Datasets Yes US Census Adult Income dataset (Kohavi & Becker, 1996); Civil Comments Dataset (Borkan et al., 2019); WILDS Camelyon17 dataset (Bandi et al., 2018); UCI Breast Cancer Wisconsin (Original) dataset (Mangasarian & Wolberg, 1990); MNIST digits (Deng, 2012); Celeb A dataset (Liu et al., 2015)
Dataset Splits No The paper describes how datasets were used for training and analysis (e.g., 'We trained DIVA on the Shifted Multi-MNIST dataset for 600 epochs' and 'The SSVAE was trained for 200 epochs...with 80% of the labels available'), but it does not specify explicit validation splits (e.g., percentages or counts for a validation set) for model tuning or hyperparameter selection separate from the training data.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as particular GPU or CPU models, memory specifications, or cloud computing instance types.
Software Dependencies No The paper mentions implementing methods and training models but does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup Yes We trained DIVA on the Shifted Multi-MNIST dataset for 600 epochs with a KL-β value of 10 and latent dimension of 64 for each of the three sub-spaces.; The SSVAE was trained for 200 epochs on a concatenation of both Psrc and Ptgt with 80% of the labels available per environment, and a batch size of 128