Mutual Information Gradient Estimation for Representation Learning
Authors: Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results have indicated significant performance improvement in learning useful representation. |
| Researcher Affiliation | Academia | 1 SMILE Lab, School of Computer Science and Engineering University of Electronic Science and Technology of China, Chengdu, China 2 Center for Artiļ¬cial Intelligence Peng Cheng Laboratory, Shenzhen, China 3 Mc Combs School of Business University of Texas at Austin, Austin, United States 4 School of Computer Science and Technology Harbin Institute of Technology, Shenzhen, China |
| Pseudocode | Yes | Algorithm 1 MIGE (Circumstance I) |
| Open Source Code | No | The provided links (https://github.com/rdevon/DIM and https://github.com/alexalemi/vib_demo) are for the baseline models (DIM and DVB) used for comparison, not for the proposed MIGE method. |
| Open Datasets | Yes | We test DIM on image datasets CIFAR-10, CIFAR-100 and STL-10 to evaluate our MIGE. ... We demonstrate an implementation of the IB objective on permutation invariant MNIST using MIGE. |
| Dataset Splits | Yes | For consistent comparison, we follow the experiments of Deep Info Max(DIM)1 to set the experimental setup as in Hjelm et al. (2019). ... We adopt the same architecture and empirical settings used in Alemi et al. (2017)... |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided. |
| Software Dependencies | No | Pytorch is mentioned as the implementation framework, but no version number is provided for Pytorch or any other software dependencies. |
| Experiment Setup | Yes | For consistent comparison, we adopt the same architecture and empirical settings used in Alemi et al. (2017) except that the initial learning rate of 2e-4 is set for Adam optimizer, and exponential decay with decaying rate by a factor of 0.96 was set for every 2 epochs. The threshold of score function s Stein gradient estimator is set as 0.94. |