Approximating mutual information of high-dimensional variables using learned representations
Authors: Gokul Gowri, Xiaokang Lun, Allon Klein, Peng Yin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using several benchmarks, we show that unlike existing techniques, LMI can approximate MI well for variables with > 103 dimensions if their dependence structure is captured by low-dimensional representations. |
| Researcher Affiliation | Academia | 1Wyss Institute for Biologically Inspired Engineering 2Department of Systems Biology, Harvard University |
| Pseudocode | Yes | Algorithm 1 Estimating MI using LMI Approximation; Algorithm 2 k-nearest neighbor log density ratio estimator; Algorithm 3 Early Stopping; Algorithm 4 Generating multivariate Gaussian datasets with low-dimensional dependence structure; Algorithm 5 KSG estimator for pointwise estimates |
| Open Source Code | Yes | Code availability The code necessary to reproduce all results from this paper are available at https://github.com/ggdna/latent-mutual-information. The lmi Python package can be found at https://github.com/ggdna/latentmi, and its documentation is hosted at https://latentmi.readthedocs.io. |
| Open Datasets | Yes | We resample two different source datasets: (1) binary subset of MNIST, containing only images of 0s and 1s, with 5000 samples and 784 dimensions and (2) embeddings of a subset of protein sequences from E. coli and A. thaliana proteins, with 4402 samples and 1024 dimensions. ... We study a previously published LT-seq data set of in vitro differentiating mouse hematopoietic stem cells [42]. |
| Dataset Splits | Yes | That is, for N joint samples, we train the network using a subset of N/2 samples, then estimate MI by applying the estimator of [10] to latent representations of the remaining N/2 samples. ... They are trained with batch size of 512, with 1 : 1 train-validation splits, and a maximum of 300 epochs using early stopping procedure provided in Algorithm 3. |
| Hardware Specification | Yes | All experiments in this paper were done using a single NVIDIA RTX 3090. |
| Software Dependencies | No | The paper mentions that 'All models are implemented in Pytorch [52]' but does not provide specific version numbers for Pytorch or any other software dependencies like Python. |
| Experiment Setup | Yes | For a variable with dimensionality d, the encoder has hidden layer sizes L, L/2, L/4 with L = max(2 log2(d) , 1024). Decoders have the same structure inverted, with hidden layer sizes L/4, L/2, L. All MLP activations used are Leaky Re LUs, with negative slope 0.2, except the last layers of decoders, which have no activation. Cross-decoders are trained with 50% dropout after each activation layer. All weights are initialized using Xavier uniform initialization [51], and optimized using Adam, with hyperparameters listed in Table 1. They are trained with batch size of 512, with 1 : 1 train-validation splits, and a maximum of 300 epochs using early stopping procedure provided in Algorithm 3. |