reproducibilityindex.ai

Model-based Policy Optimization with Unsupervised Model Adaptation

Authors: Jian Shen, Han Zhao, Weinan Zhang, Yong Yu

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our approach achieves state-of-the-art performance in terms of sample efﬁciency on a range of continuous control benchmark tasks.
Researcher Affiliation	Collaboration	Shanghai Jiao Tong University, D. E. Shaw & Co {rockyshen, wnzhang, yyu}@apex.sjtu.edu.cn han.zhao@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 AMPO
Open Source Code	Yes	Our code is publicly available at: https://github.com/RockySJ/ampo
Open Datasets	Yes	We evaluate AMPO and other baselines on six Mu Jo Co continuous control tasks with a maximum horizon of 1000 from Open AI Gym [Brockman et al., 2016], including Inverted Pendulum, Swimmer, Hopper, Walker2d, Ant and Half Cheetah.
Dataset Splits	No	Every time we train the dynamics model, we randomly sample several real data as a validation set and stop the model training if the model loss does not decrease for ﬁve gradient steps, which means we do not choose a speciﬁc value for the hyperparameter G1.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	We implement all our experiments using Tensor Flow.
Experiment Setup	Yes	In each adaptation iteration, we train the critic for ﬁve steps and then train the feature extractor for one step, and the coefﬁcient α of gradient penalty is set to 10.