RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability
Authors: Chuning Zhu, Max Simchowitz, Siri Gadipudi, Abhishek Gupta
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show its effectiveness in simulation benchmarks with significant spurious variations as well as a real-world egocentric navigation task with noisy TVs in the background. We conduct empirical experiments to answer the following research questions: |
| Researcher Affiliation | Academia | Chuning Zhu University of Washington Seattle, WA 98105 zchuning@cs.washington.edu Max Simchowitz Massachusetts Institute of Technology Boston, MA 02139 msimchow@mit.edu Siri Gadipudi University of Washington Seattle, WA 98105 sg06@uw.edu Abhishek Gupta University of Washington Seattle, WA 98105 abhgupta@cs.washington.edu |
| Pseudocode | Yes | A Algorithm Pseudocode Algorithm 1 Resilient Model-Based RL by Regularizing Posterior Predictability (Re Po) Algorithm 2 Semi-Supervised Adaptation of Visual Encoder Using Support Constraint |
| Open Source Code | Yes | Videos and code: https://zchuning.github.io/repo-website/. |
| Open Datasets | Yes | Distracted Deep Mind Control Suite [64, 63] is a variant of Deep Mind Control Suite... Realistic Maniskill is a benchmark we constructed based on the Maniskill2 benchmark [15], but with realistic backgrounds from Matterport [3]. |
| Dataset Splits | No | The paper states "We tune the initial Lagrange multiplier β0, target KL ϵ, and KL balancing ratio r on our evaluation tasks" (Section C, Implementation Details), which implies a validation process for hyperparameters. However, it does not explicitly describe a specific "training/test/validation dataset split" with percentages or sample counts for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or specific cloud instances) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Adam [29] optimizer" but does not specify version numbers for Adam or any other software dependencies (e.g., programming languages, libraries, frameworks) required for reproducibility. |
| Experiment Setup | Yes | We fix the image size to 64 64 and parameterize the image encoder using a 4-layer CNN with {32, 64, 128, 256} channels, kernel size 4, stride 2, and Re LU activation. We train the RL agent in an online setting, performing 100 training steps for every 500 environment steps... The image encoder, recurrent state space model, and reward model share the same learning rate of 3e-4. The policy and value function use a learning rate of 8e-5. Table 2: Hyperparameters for evaluation tasks. |