reproducibilityindex.ai

RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability

Authors: Chuning Zhu, Max Simchowitz, Siri Gadipudi, Abhishek Gupta

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show its effectiveness in simulation benchmarks with significant spurious variations as well as a real-world egocentric navigation task with noisy TVs in the background. We conduct empirical experiments to answer the following research questions:
Researcher Affiliation	Academia	Chuning Zhu University of Washington Seattle, WA 98105 zchuning@cs.washington.edu Max Simchowitz Massachusetts Institute of Technology Boston, MA 02139 msimchow@mit.edu Siri Gadipudi University of Washington Seattle, WA 98105 sg06@uw.edu Abhishek Gupta University of Washington Seattle, WA 98105 abhgupta@cs.washington.edu
Pseudocode	Yes	A Algorithm Pseudocode Algorithm 1 Resilient Model-Based RL by Regularizing Posterior Predictability (Re Po) Algorithm 2 Semi-Supervised Adaptation of Visual Encoder Using Support Constraint
Open Source Code	Yes	Videos and code: https://zchuning.github.io/repo-website/.
Open Datasets	Yes	Distracted Deep Mind Control Suite [64, 63] is a variant of Deep Mind Control Suite... Realistic Maniskill is a benchmark we constructed based on the Maniskill2 benchmark [15], but with realistic backgrounds from Matterport [3].
Dataset Splits	No	The paper states "We tune the initial Lagrange multiplier β0, target KL ϵ, and KL balancing ratio r on our evaluation tasks" (Section C, Implementation Details), which implies a validation process for hyperparameters. However, it does not explicitly describe a specific "training/test/validation dataset split" with percentages or sample counts for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or specific cloud instances) used for running its experiments.
Software Dependencies	No	The paper mentions "Adam [29] optimizer" but does not specify version numbers for Adam or any other software dependencies (e.g., programming languages, libraries, frameworks) required for reproducibility.
Experiment Setup	Yes	We fix the image size to 64 64 and parameterize the image encoder using a 4-layer CNN with {32, 64, 128, 256} channels, kernel size 4, stride 2, and Re LU activation. We train the RL agent in an online setting, performing 100 training steps for every 500 environment steps... The image encoder, recurrent state space model, and reward model share the same learning rate of 3e-4. The policy and value function use a learning rate of 8e-5. Table 2: Hyperparameters for evaluation tasks.