Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Adaptation Augmented Model-based Policy Optimization
Authors: Jian Shen, Hang Lai, Minghuan Liu, Han Zhao, Yong Yu, Weinan Zhang
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on challenging continuous control tasks show that FAMPO and IAMPO, coupled with our model usage technique, achieves superior performance against baselines, which demonstrates the effectiveness of the proposed methods. Keywords: Model-based reinforcement learning, distribution shift, occupancy measure, Integral Probability Metric, importance sampling |
| Researcher Affiliation | Academia | Jian Shen EMAIL Hang Lai EMAIL Minghuan Liu EMAIL Han Zhao EMAIL Yong Yu EMAIL Weinan Zhang EMAIL Department of Computer Science, Shanghai Jiao Tong University Department of Computer Science, University of Illinois, Urbana-Champaign |
| Pseudocode | Yes | Algorithm 1 FAMPO ... Algorithm 2 IAMPO |
| Open Source Code | No | The paper states: "We implement all our experiments using Tensor Flow." However, it does not explicitly state that the code for the described methodology is released or provide a link to a code repository. |
| Open Datasets | Yes | We evaluate our methods and other baselines on six Mu Jo Co continuous control tasks from Open AI Gym (Brockman et al., 2016) |
| Dataset Splits | No | The paper describes dynamic data collection into environment and model buffers (Denv and Dmodel) and how samples are drawn from them for training. It does not provide specific fixed dataset splits (e.g., percentages or counts for training, validation, and testing sets) in the traditional supervised learning sense for reproducibility. |
| Hardware Specification | No | The paper mentions implementing experiments using Tensor Flow and using MuJoCo environments, but it does not specify any particular hardware components (e.g., GPU models, CPU types, or cloud computing specifications) used for conducting the experiments. |
| Software Dependencies | No | The paper states: "We implement all our experiments using Tensor Flow." While a software library is mentioned, a specific version number for Tensor Flow or any other software dependency is not provided. |
| Experiment Setup | Yes | Other important hyperparameters used in our methods are chosen by grid search and detailed hyperparameter settings can be found in Appendix E. Table 1: Common hyperparameters for FAMPO and IAMPO. Table 2: Distinct hyperparameters for FAMPO. Table 3: Distinct hyperparameters for IAMPO. |