InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization
Authors: Zhengyang Hu, Song Kang, Qunsong Zeng, Kaibin Huang, Yanchao Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness and generalization of our proposed mutual information estimation scheme on various families of distributions and applications. Our results demonstrate that Info Net and its training process provide a graceful efficiency-accuracy trade-off and order-preserving properties. |
| Researcher Affiliation | Academia | 1Department of Electrical and Electronic Engineering, the University of Hong Kong 2School of Information Science and Technology, University of Science and Technology of China 3Work done as an intern at HKU 4Musketeers Foundation Institute of Data Science, the University of Hong Kong. |
| Pseudocode | Yes | Algorithm 1 Info Net Training |
| Open Source Code | Yes | Our code and models are available as a comprehensive toolbox to facilitate studies in different fields requiring real-time mutual information estimation. |
| Open Datasets | Yes | To generate training data, we consider sampling the joint distributions (sequences) D = {(xi, yi)}N i=1 from Gaussian Mixture Models (GMMs). Our approach involves using the empirical CDF of X and Y to map them to a uniform distribution between [0, 1] prior to training and evaluation. |
| Dataset Splits | No | The paper describes its data generation for training and evaluation. It mentions 'A training batch contains 32 randomly generated GMM distributions (sequences) with a sample length of 2000' and mentions evaluation datasets like 'Pointodyssey'. However, it does not provide explicit training/validation/test split percentages or sample counts for a single dataset. |
| Hardware Specification | Yes | All evaluations are conducted on an RTX 4090 GPU and an AMD Ryzen Threadripper PRO 5975WX 32-Core CPU. |
| Software Dependencies | No | The paper mentions 'deep learning infrastructures' but does not specify any software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). |
| Experiment Setup | Yes | A training batch contains 32 randomly generated GMM distributions (sequences) with a sample length of 2000. For MINE, the parameters for testtime optimization are: a batch size of 100 and a learning rate of 0.001, while MINE-500 indicates 500 training iterations. The KSG method uses a neighborhood size of 5 for best performance. |