VIME: Variational Information Maximizing Exploration

Authors: Rein Houthooft, Xi Chen, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.
Researcher Affiliation Collaboration UC Berkeley, Department of Electrical Engineering and Computer Sciences Ghent University imec, Department of Information Technology Open AI
Pseudocode Yes Algorithm 1: Variational Information Maximizing Exploration (VIME)
Open Source Code No The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets Yes All experiments make use of the rllab [15] benchmark code base and the complementary continuous control tasks suite. The following tasks are part of the experimental setup: Cart Pole (S R4, A R1), Cart Pole Swingup (S R4, A R1), Double Pendulum (S R6, A R1), Mountain Car (S R3, A R1), locomotion tasks Half Cheetah (S R20, A R6), Walker2D (S R20, A R6), and the hierarchical task Swimmer Gather (S R33, A R2).
Dataset Splits No The paper does not specify traditional training/validation/test dataset splits. It describes continuous interaction with environments, not static datasets split into these partitions.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions using the "rllab benchmark code base" but does not specify any software versions for libraries, frameworks, or programming languages.
Experiment Setup No The paper states that "The exact setup is described in the Appendix." which is not provided in the main text. While it discusses the hyperparameter η, it does not provide specific values used for the main experimental results within the main text.