Towards Explaining Distribution Shifts
Authors: Sean Kulinski, David I. Inouye
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In section 5, we show empirical results on real-world tabular, text, and image-based datasets demonstrating how our explanations can aid an operator in understanding how a distribution has shifted. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA. |
| Pseudocode | Yes | Algorithm 1 Finding k-Sparse Maps; Algorithm 2 Solving for k-Cluster Mappings |
| Open Source Code | Yes | Code to recreate the experiments can be found at https://github.com/inouye-lab/explaining-distribution-shifts. |
| Open Datasets | Yes | US Census Adult Income dataset (Kohavi & Becker, 1996); Civil Comments Dataset (Borkan et al., 2019); WILDS Camelyon17 dataset (Bandi et al., 2018); UCI Breast Cancer Wisconsin (Original) dataset (Mangasarian & Wolberg, 1990); MNIST digits (Deng, 2012); Celeb A dataset (Liu et al., 2015) |
| Dataset Splits | No | The paper describes how datasets were used for training and analysis (e.g., 'We trained DIVA on the Shifted Multi-MNIST dataset for 600 epochs' and 'The SSVAE was trained for 200 epochs...with 80% of the labels available'), but it does not specify explicit validation splits (e.g., percentages or counts for a validation set) for model tuning or hyperparameter selection separate from the training data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as particular GPU or CPU models, memory specifications, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions implementing methods and training models but does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | We trained DIVA on the Shifted Multi-MNIST dataset for 600 epochs with a KL-β value of 10 and latent dimension of 64 for each of the three sub-spaces.; The SSVAE was trained for 200 epochs on a concatenation of both Psrc and Ptgt with 80% of the labels available per environment, and a batch size of 128 |