reproducibilityindex.ai

Mutual Information Estimation using LSH Sampling

Authors: Ryan Spring, Anshumali Shrivastava

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our LSH sampling estimate provides a superior bias-variance trade-off when compared to other state-of-the-art approaches. We designed the experiments to answer the following four important questions: 1. Does importance sampling alleviate the dependency on the batch size for estimating mutual information using NCE? 2. What is the bias/variance trade-off for our LSH importance sampling approach?
Researcher Affiliation	Academia	Ryan Spring and Anshumali Shrivastava Rice University, Houston, Texas, USA rdspring1@rice.com , anshumali@rice.edu
Pseudocode	Yes	Algorithm 1 LSS Preprocessing; Algorithm 2 LSS Partition Estimate
Open Source Code	Yes	The code1 for the experiments is available online. 1https://github.com/rdspring1/LSH-Mutual-Information
Open Datasets	Yes	We applied the various estimators to a correlated Gaussian problem [Poole et al., 2019]. We used a separable critic architecture where f(x, y) = g(x) f(y) where f and g are neural network functions. The X and Y variables are drawn from a 20-d Gaussian distribution with zero mean and correlation ρ.
Dataset Splits	No	The paper describes generating data from a Gaussian distribution and varying parameters like correlation and batch size for evaluation, but it does not specify traditional train/validation/test dataset splits from a pre-existing dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., specific GPU models, CPU types, or cloud instance specifications).
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers required for reproducibility (e.g., PyTorch 1.9, TensorFlow 2.0).
Experiment Setup	Yes	The LSH data structure used k = 10 bits and L = 10 hash tables. The LSH data structure contains 5K items with k = 8 bits and L = 10 hash tables. The average sample size per query was 91 elements and a 32 batch size. For the interpolate method, α = 0.01. We compare NCE, Uniform IS, and LSH IS for batch size 50.