Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning
Authors: Shakir Mohamed, Danilo Jimenez Rezende
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that empowerment-based behaviours obtained using variational information maximisation match those using the exact computation. We then apply our algorithms to a broad range of high-dimensional problems for which it is not possible to compute the exact solution, but for which we are able to act according to empowerment learning directly from pixel information. |
| Researcher Affiliation | Industry | Google Deep Mind, London {shakir, danilor}@google.com |
| Pseudocode | Yes | Algorithm 1: Stochastic Variational Information Maximisation for Empowerment |
| Open Source Code | No | The paper provides links to YouTube videos demonstrating results, but no explicit statement or link for the source code of the methodology itself. |
| Open Datasets | No | The paper describes environments like "room environment" and "maze environment" and uses "pixel information (on 20x20 images)", and references a "3D physics simulation [29]" but does not provide access information (link, citation, or repository) for any publicly available or open dataset used for training. |
| Dataset Splits | No | The paper does not explicitly mention training, validation, or test dataset splits, percentages, or sample counts. |
| Hardware Specification | No | The paper mentions using "GPUs" for computation, but does not provide specific hardware details such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions "convolutional neural network" and "Adagrad" but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For all these experiments we used a horizon of K = 5. ... The agent may have other actions, such as picking up a key or laying down a brick. There are no external rewards available and the agent must reason purely using visual (pixel) information. ... The state is the position, velocity and angular momentum of the agent and the predator, and the action is a 2D force vector. |