Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Comparing Distributions by Measuring Differences that Affect Decision Making

Authors: Shengjia Zhao, Abhishek Sinha, Yutong He, Aidan Perreault, Jiaming Song, Stefano Ermon

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our approach to two-sample tests, and on various benchmarks, we achieve superior test power compared to competing methods. We demonstrate the effectiveness of H-divergence in two sample tests
Researcher Affiliation Academia Department of Computer Science Stanford University
Pseudocode No The paper does not contain any pseudocode or algorithm blocks. Methodological steps are described in prose and mathematical formulations.
Open Source Code Yes The code to reproduce our experiments can be found here. [footnote]
Open Datasets Yes We follow Liu et al. (2020) and consider four datasets: Blob (Liu et al., 2020), HDGM (Liu et al., 2020), HIGGS (Adam-Bourdarios et al., 2014) and MNIST (Le Cun & Cortes, 2010). We use the NOAA database which contains daily weather from thousands of weather stations at different geographical locations. We obtain the crop yield dataset from (FAOSTAT et al., 2006)
Dataset Splits Yes We split each dataset into two equal partitions: a training set to tune hyper-parameters, and a validation set to compute the final test output.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions implementing methods and using various models (e.g., mixture of Gaussian distributions, Parzen density estimator, Variational Autoencoder, Kernel Ridge regression), but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We choose φ(θ, λ) = θs+λs / 2 1/s for s > 1... We define l(x, a) as the negative log likelihood of x under distribution a, where a is in a certain model family A. We experiment with mixture of Gaussian distributions, Parzen density estimtor and Variational Autoencoder (Kingma & Welling, 2013). Our hyper-parameters consist of the best parameter s and also the best generative model family. We use α = 0.05 in all two-sample test experiments. Each permutation test uses 100 permutations, and we run each test 100 times to compute the test power