reproducibilityindex.ai

Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic

Authors: Zhihai Wang, Jie Wang, Qi Zhou, Bin Li, Houqiang Li8612-8620

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks, and the proposed method is more robust than previous methods in noisy environments. Experiments show that CMBAC significantly outperforms state-of-the-art methods in terms of sample efficiency on several challenging control tasks (Brockman et al. 2016; Todorov, Erez, and Tassa 2012).
Researcher Affiliation	Academia	1CAS Key Laboratory of Technology in GIPAS, University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center {zhwangx, zhouqida}@mail.ustc.edu.cn {jiewangx, binli, lihq}@ustc.edu.cn
Pseudocode	Yes	Algorithm 1: Pseudo code for CMBAC .
Open Source Code	No	The paper does not include an explicit statement about making the source code available or provide a link to a code repository for the described methodology.
Open Datasets	Yes	We evaluate CMBAC and these baselines on Mu Jo Co (Todorov, Erez, and Tassa 2012) benchmark tasks as used in MBPO.
Dataset Splits	No	The paper describes the process of data collection and model usage (e.g., Denv, Dmodel) but does not provide specific train/validation/test dataset splits (percentages, sample counts, or citations to predefined splits) to reproduce data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions using MuJoCo benchmark tasks and Soft Actor-Critic (SAC) for policy learning, but it does not provide specific version numbers for these or other software components/libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	For all environments except Walker2d, we use the number of dropped estimates L = 1. On Walker2d, we use L = 0. For our method, we select the hyperparameter M for each environment independently via grid search. The best hyperparameter for Humanoid, Hopper, Walker2d, and the rest is M = 1, 3, 4, 2, respectively. The details of the experimental setup are in Appendix B.