Comparing Distributions by Measuring Differences that Affect Decision Making

Authors: Shengjia Zhao, Abhishek Sinha, Yutong He, Aidan Perreault, Jiaming Song, Stefano Ermon

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our approach to two-sample tests, and on various benchmarks, we achieve superior test power compared to competing methods. We demonstrate the effectiveness of H-divergence in two sample tests
Researcher Affiliation Academia Department of Computer Science Stanford University
Pseudocode No The paper does not contain any pseudocode or algorithm blocks. Methodological steps are described in prose and mathematical formulations.
Open Source Code Yes The code to reproduce our experiments can be found here. [footnote]
Open Datasets Yes We follow Liu et al. (2020) and consider four datasets: Blob (Liu et al., 2020), HDGM (Liu et al., 2020), HIGGS (Adam-Bourdarios et al., 2014) and MNIST (Le Cun & Cortes, 2010). We use the NOAA database which contains daily weather from thousands of weather stations at different geographical locations. We obtain the crop yield dataset from (FAOSTAT et al., 2006)
Dataset Splits Yes We split each dataset into two equal partitions: a training set to tune hyper-parameters, and a validation set to compute the final test output.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions implementing methods and using various models (e.g., mixture of Gaussian distributions, Parzen density estimator, Variational Autoencoder, Kernel Ridge regression), but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We choose φ(θ, λ) = θs+λs / 2 1/s for s > 1... We define l(x, a) as the negative log likelihood of x under distribution a, where a is in a certain model family A. We experiment with mixture of Gaussian distributions, Parzen density estimtor and Variational Autoencoder (Kingma & Welling, 2013). Our hyper-parameters consist of the best parameter s and also the best generative model family. We use α = 0.05 in all two-sample test experiments. Each permutation test uses 100 permutations, and we run each test 100 times to compute the test power