Approximating mutual information of high-dimensional variables using learned representations

Authors: Gokul Gowri, Xiaokang Lun, Allon Klein, Peng Yin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using several benchmarks, we show that unlike existing techniques, LMI can approximate MI well for variables with > 103 dimensions if their dependence structure is captured by low-dimensional representations.
Researcher Affiliation Academia 1Wyss Institute for Biologically Inspired Engineering 2Department of Systems Biology, Harvard University
Pseudocode Yes Algorithm 1 Estimating MI using LMI Approximation; Algorithm 2 k-nearest neighbor log density ratio estimator; Algorithm 3 Early Stopping; Algorithm 4 Generating multivariate Gaussian datasets with low-dimensional dependence structure; Algorithm 5 KSG estimator for pointwise estimates
Open Source Code Yes Code availability The code necessary to reproduce all results from this paper are available at https://github.com/ggdna/latent-mutual-information. The lmi Python package can be found at https://github.com/ggdna/latentmi, and its documentation is hosted at https://latentmi.readthedocs.io.
Open Datasets Yes We resample two different source datasets: (1) binary subset of MNIST, containing only images of 0s and 1s, with 5000 samples and 784 dimensions and (2) embeddings of a subset of protein sequences from E. coli and A. thaliana proteins, with 4402 samples and 1024 dimensions. ... We study a previously published LT-seq data set of in vitro differentiating mouse hematopoietic stem cells [42].
Dataset Splits Yes That is, for N joint samples, we train the network using a subset of N/2 samples, then estimate MI by applying the estimator of [10] to latent representations of the remaining N/2 samples. ... They are trained with batch size of 512, with 1 : 1 train-validation splits, and a maximum of 300 epochs using early stopping procedure provided in Algorithm 3.
Hardware Specification Yes All experiments in this paper were done using a single NVIDIA RTX 3090.
Software Dependencies No The paper mentions that 'All models are implemented in Pytorch [52]' but does not provide specific version numbers for Pytorch or any other software dependencies like Python.
Experiment Setup Yes For a variable with dimensionality d, the encoder has hidden layer sizes L, L/2, L/4 with L = max(2 log2(d) , 1024). Decoders have the same structure inverted, with hidden layer sizes L/4, L/2, L. All MLP activations used are Leaky Re LUs, with negative slope 0.2, except the last layers of decoders, which have no activation. Cross-decoders are trained with 50% dropout after each activation layer. All weights are initialized using Xavier uniform initialization [51], and optimized using Adam, with hyperparameters listed in Table 1. They are trained with batch size of 512, with 1 : 1 train-validation splits, and a maximum of 300 epochs using early stopping procedure provided in Algorithm 3.