Information is Power: Intrinsic Control via Information Capture
Authors: Nicholas Rhinehart, Jenny Wang, Glen Berseth, John Co-Reyes, Danijar Hafner, Chelsea Finn, Sergey Levine
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments are designed to answer the following questions: Q1: Intrinsic control capability: Does our latent visitation-based self-supervised reward signal cause the agent to stabilize partiallyobserved visual environments with dynamic entities more effectively than prior self-supervised stabilization objectives? Q2: Properties of the IC2 reward and alternative intrinsic rewards: What types of emergent behaviors does each belief-based objective described in Section 3.1 evoke? In order to answer these questions, we identified environments with the following properties (i): partial observability, (ii): dynamic entities that the agent can affect, and (iii): high-dimensional observations. Our primary results are presented in Fig. 8. |
| Researcher Affiliation | Collaboration | Nicholas Rhinehart UC Berkeley Jenny Wang UC Berkeley Glen Berseth UC Berkeley John D. Co-Reyes UC Berkeley Danijar Hafner University of Toronto Google Research, Brain Team Chelsea Finn Stanford University Sergey Levine UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Intrinsic Control via Information Capture (IC2) |
| Open Source Code | No | The paper states, 'We present videos on our project page: https://sites.google.com/view/ic2/home,' but it does not explicitly mention that the source code for the methodology is available at this link or elsewhere. |
| Open Datasets | No | The paper describes the 'Two Room Environment', 'Viz Doom Defend The Center environment', and 'One Room Capture3D environment' and mentions the 'Mini World framework' but does not provide specific access information (links, DOIs, repositories, or formal citations) for any public or open datasets used. |
| Dataset Splits | No | The paper mentions training and evaluation on environments but does not specify any explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper states 'We implemented an LSSM in Py Torch [55]' but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We represent each policy π(at|vposterior,t) as a two-layer fully-connected MLP with 128 units. ... We perform evaluation at 5e6 environment steps with 50 policy rollouts per random seed, with 3 random seeds for each method (150 rollouts total). |