Mutual Information Estimation using LSH Sampling

Authors: Ryan Spring, Anshumali Shrivastava

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our LSH sampling estimate provides a superior bias-variance trade-off when compared to other state-of-the-art approaches. We designed the experiments to answer the following four important questions: 1. Does importance sampling alleviate the dependency on the batch size for estimating mutual information using NCE? 2. What is the bias/variance trade-off for our LSH importance sampling approach?
Researcher Affiliation Academia Ryan Spring and Anshumali Shrivastava Rice University, Houston, Texas, USA rdspring1@rice.com , anshumali@rice.edu
Pseudocode Yes Algorithm 1 LSS Preprocessing; Algorithm 2 LSS Partition Estimate
Open Source Code Yes The code1 for the experiments is available online. 1https://github.com/rdspring1/LSH-Mutual-Information
Open Datasets Yes We applied the various estimators to a correlated Gaussian problem [Poole et al., 2019]. We used a separable critic architecture where f(x, y) = g(x) f(y) where f and g are neural network functions. The X and Y variables are drawn from a 20-d Gaussian distribution with zero mean and correlation ρ.
Dataset Splits No The paper describes generating data from a Gaussian distribution and varying parameters like correlation and batch size for evaluation, but it does not specify traditional train/validation/test dataset splits from a pre-existing dataset.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., specific GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper does not list specific software dependencies with their version numbers required for reproducibility (e.g., PyTorch 1.9, TensorFlow 2.0).
Experiment Setup Yes The LSH data structure used k = 10 bits and L = 10 hash tables. The LSH data structure contains 5K items with k = 8 bits and L = 10 hash tables. The average sample size per query was 91 elements and a 32 batch size. For the interpolate method, α = 0.01. We compare NCE, Uniform IS, and LSH IS for batch size 50.