EMI: Exploration with Mutual Information

Authors: Hyoungseok Kim, Jaekyeom Kim, Yeonwoo Jeong, Sergey Levine, Hyun Oh Song

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show competitive results on challenging locomotion tasks with continuous control and on image-based exploration tasks with discrete actions on Atari. The source code is available at https://github.com/snu-mllab/EMI.
Researcher Affiliation Academia Hyoungseok Kim * 1 2 Jaekyeom Kim * 1 2 Yeonwoo Jeong 1 2 Sergey Levine 3 Hyun Oh Song 1 2 1Seoul National University, Department of Computer Science and Engineering 2Neural Processing Research Center 3UC Berkeley, Department of Electrical Engineering and Computer Sciences. Correspondence to: Hyun Oh Song <hyunoh@snu.ac.kr>.
Pseudocode Yes Algorithm 1 shows the complete procedure in detail.
Open Source Code Yes The source code is available at https://github.com/snu-mllab/EMI.
Open Datasets Yes We compare the experimental performance of EMI to recent prior works on both low-dimensional locomotion tasks with continuous control from rllab benchmark (Duan et al., 2016) and the complex vision-based tasks with discrete control from the Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits No The paper discusses the datasets used (rllab benchmark, Atari environments) but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, counts, or explicit standard splits for these environments beyond just naming them).
Hardware Specification No The paper does not mention any specific hardware (GPU models, CPU types, memory, etc.) used for running the experiments.
Software Dependencies No The paper mentions using 'TRPO (Schulman et al., 2015)' and 'Adam (Kingma & Ba, 2015) optimizer' and general neural network architectures, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In the locomotion experiments, we use a 2-layer fully connected neural network as the policy network. In the Atari experiments, we use a 2-layer convolutional neural network followed by a single layer fully connected neural network. We convert the 84 x 84 input RGB frames to grayscale images and resize them to 52 x 52 images following the practice in Tang et al. (2017). The embedding dimensionality is set to d = 2 and intrinsic reward coefficient is set to η = 0.001 in all of the environments. We use Adam (Kingma & Ba, 2015) optimizer to train embedding networks.