Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptation Augmented Model-based Policy Optimization

Authors: Jian Shen, Hang Lai, Minghuan Liu, Han Zhao, Yong Yu, Weinan Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on challenging continuous control tasks show that FAMPO and IAMPO, coupled with our model usage technique, achieves superior performance against baselines, which demonstrates the effectiveness of the proposed methods. Keywords: Model-based reinforcement learning, distribution shift, occupancy measure, Integral Probability Metric, importance sampling
Researcher Affiliation Academia Jian Shen EMAIL Hang Lai EMAIL Minghuan Liu EMAIL Han Zhao EMAIL Yong Yu EMAIL Weinan Zhang EMAIL Department of Computer Science, Shanghai Jiao Tong University Department of Computer Science, University of Illinois, Urbana-Champaign
Pseudocode Yes Algorithm 1 FAMPO ... Algorithm 2 IAMPO
Open Source Code No The paper states: "We implement all our experiments using Tensor Flow." However, it does not explicitly state that the code for the described methodology is released or provide a link to a code repository.
Open Datasets Yes We evaluate our methods and other baselines on six Mu Jo Co continuous control tasks from Open AI Gym (Brockman et al., 2016)
Dataset Splits No The paper describes dynamic data collection into environment and model buffers (Denv and Dmodel) and how samples are drawn from them for training. It does not provide specific fixed dataset splits (e.g., percentages or counts for training, validation, and testing sets) in the traditional supervised learning sense for reproducibility.
Hardware Specification No The paper mentions implementing experiments using Tensor Flow and using MuJoCo environments, but it does not specify any particular hardware components (e.g., GPU models, CPU types, or cloud computing specifications) used for conducting the experiments.
Software Dependencies No The paper states: "We implement all our experiments using Tensor Flow." While a software library is mentioned, a specific version number for Tensor Flow or any other software dependency is not provided.
Experiment Setup Yes Other important hyperparameters used in our methods are chosen by grid search and detailed hyperparameter settings can be found in Appendix E. Table 1: Common hyperparameters for FAMPO and IAMPO. Table 2: Distinct hyperparameters for FAMPO. Table 3: Distinct hyperparameters for IAMPO.