reproducibilityindex.ai

Comparing Distributions by Measuring Differences that Affect Decision Making

Authors: Shengjia Zhao, Abhishek Sinha, Yutong He, Aidan Perreault, Jiaming Song, Stefano Ermon

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our approach to two-sample tests, and on various benchmarks, we achieve superior test power compared to competing methods. We demonstrate the effectiveness of H-divergence in two sample tests
Researcher Affiliation	Academia	Department of Computer Science Stanford University
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks. Methodological steps are described in prose and mathematical formulations.
Open Source Code	Yes	The code to reproduce our experiments can be found here. [footnote]
Open Datasets	Yes	We follow Liu et al. (2020) and consider four datasets: Blob (Liu et al., 2020), HDGM (Liu et al., 2020), HIGGS (Adam-Bourdarios et al., 2014) and MNIST (Le Cun & Cortes, 2010). We use the NOAA database which contains daily weather from thousands of weather stations at different geographical locations. We obtain the crop yield dataset from (FAOSTAT et al., 2006)
Dataset Splits	Yes	We split each dataset into two equal partitions: a training set to tune hyper-parameters, and a validation set to compute the ﬁnal test output.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies	No	The paper mentions implementing methods and using various models (e.g., mixture of Gaussian distributions, Parzen density estimator, Variational Autoencoder, Kernel Ridge regression), but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We choose φ(θ, λ) = θs+λs / 2 1/s for s > 1... We deﬁne l(x, a) as the negative log likelihood of x under distribution a, where a is in a certain model family A. We experiment with mixture of Gaussian distributions, Parzen density estimtor and Variational Autoencoder (Kingma & Welling, 2013). Our hyper-parameters consist of the best parameter s and also the best generative model family. We use α = 0.05 in all two-sample test experiments. Each permutation test uses 100 permutations, and we run each test 100 times to compute the test power