Model-Based Offline Meta-Reinforcement Learning with Regularization

Authors: Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments corroborate the superior performance of Mer PO over existing offline Meta-RL methods.
Researcher Affiliation Academia Sen Lin1, Jialin Wan1, Tengyu Xu2, Yingbin Liang2, Junshan Zhang1,3 1School of ECEE, Arizona State University 2Department of ECE, The Ohio State University 3Department of ECE, University of California, Davis
Pseudocode Yes Algorithm 1 RAC
Open Source Code Yes For the experimental results presented in the main text, we include the code in the supplemental material, and specify all the training details in Appendix A.
Open Datasets Yes We evaluate RAC on several continuous control tasks in the D4RL benchmark (Fu et al., 2020) from the Open AI Gym (Brockman et al., 2016)
Dataset Splits No The paper mentions using a 'held-out set' for validation prediction error when selecting models, but does not specify exact split percentages or sample counts for training, validation, and test sets. It does not provide sufficient details to reproduce the data partitioning precisely.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions using Open AI Gym and D4RL, and references SAC, but does not provide specific version numbers for any software dependencies or libraries needed for reproducibility.
Experiment Setup Yes Table 1: Hyperparameters for RAC. and Table 2: Hyperparameters for Mer PO.