Mutual Information Gradient Estimation for Representation Learning

Authors: Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results have indicated significant performance improvement in learning useful representation.
Researcher Affiliation Academia 1 SMILE Lab, School of Computer Science and Engineering University of Electronic Science and Technology of China, Chengdu, China 2 Center for Artificial Intelligence Peng Cheng Laboratory, Shenzhen, China 3 Mc Combs School of Business University of Texas at Austin, Austin, United States 4 School of Computer Science and Technology Harbin Institute of Technology, Shenzhen, China
Pseudocode Yes Algorithm 1 MIGE (Circumstance I)
Open Source Code No The provided links (https://github.com/rdevon/DIM and https://github.com/alexalemi/vib_demo) are for the baseline models (DIM and DVB) used for comparison, not for the proposed MIGE method.
Open Datasets Yes We test DIM on image datasets CIFAR-10, CIFAR-100 and STL-10 to evaluate our MIGE. ... We demonstrate an implementation of the IB objective on permutation invariant MNIST using MIGE.
Dataset Splits Yes For consistent comparison, we follow the experiments of Deep Info Max(DIM)1 to set the experimental setup as in Hjelm et al. (2019). ... We adopt the same architecture and empirical settings used in Alemi et al. (2017)...
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided.
Software Dependencies No Pytorch is mentioned as the implementation framework, but no version number is provided for Pytorch or any other software dependencies.
Experiment Setup Yes For consistent comparison, we adopt the same architecture and empirical settings used in Alemi et al. (2017) except that the initial learning rate of 2e-4 is set for Adam optimizer, and exponential decay with decaying rate by a factor of 0.96 was set for every 2 epochs. The threshold of score function s Stein gradient estimator is set as 0.94.