Information Prioritization through Empowerment in Visual Model-based RL
Authors: Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds, and show that the proposed prioritized information objective outperforms state-of-the-art model based RL approaches with higher sample efficiency and episodic returns. |
| Researcher Affiliation | Collaboration | Homanga Bharadhwaj Carnegie Mellon University Mohammad Babaeizadeh Google Research, Brain Team Dumitru Erhan Google Research, Brain Team Sergey Levine Google Research, Brain Team University of California Berkeley |
| Pseudocode | Yes | Algorithm 1: Information Prioritization in Visual Model-based RL (Info Power) |
| Open Source Code | No | Please refer to the website for a summary and qualitative visualization results https://sites.google.com/view/information-empowerment. The linked website states 'Code Coming Soon', indicating the code is not yet available. |
| Open Datasets | Yes | We perform experiments with modiļ¬ed Deep Mind Control Suite environments (Tassa et al., 2018), with natural video distractors from ILSVRC dataset (Russakovsky et al., 2015) in the background. |
| Dataset Splits | No | The paper mentions 'We use 200 videos during training, and reserve 50 videos for testing' for the ILSVRC dataset, but it does not specify a validation dataset split. |
| Hardware Specification | Yes | We implement our approach with Tensor Flow 2 and use a single Nvidia V100 GPU and 10 CPU cores for each training run. |
| Software Dependencies | No | The paper mentions 'Tensor Flow 2' but does not provide specific version numbers for TensorFlow or any other software libraries used. |
| Experiment Setup | Yes | We use ADAM optimizer, with learning rate of 6e-4 for the latent-state space model, and 8e-5 for the value function and policy optimization. The hyper-parameter c0 for the prioritized information constraint is set to 1000... The encoder consists of 4 convolutional layers with kernel size 4 and channel numbers 32, 65, 128, 256. |