reproducibilityindex.ai

Gradient Information Matters in Policy Optimization by Back-propagating through Model

Authors: Chongchong Li, Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically demonstrate the proposed algorithm has better sample efficiency when achieving a comparable or better performance on benchmark continuous control tasks. Codes are available at https://github.com/CCreal/ddppo
Researcher Affiliation	Collaboration	1 Beijing Jiaotong University {18118002,ytliu}@bjtu.edu.cn 2 Microsoft Research Asia {yuwang5,tyliu}@microsoft.com 3 Institute of Computing Technology, Chinese Academy of Sciences chenwei2022@ict.ac.cn 4 Academy of Mathematics and Systems Science, Chinese Academy of Sciences mazm@amt.ac.cn
Pseudocode	Yes	Algorithm 1 Directional Derivative Projection Policy Optimization
Open Source Code	Yes	Codes are available at https://github.com/CCreal/ddppo
Open Datasets	Yes	We evaluate our approach on six continuous control benchmark tasks in the Mu Jo Co (Todorov et al., 2012) simulator in our experiments: Inverted Pendulum-v2, Hopper-v2, Walker2d-v2, Half Cheetahv2, Ant-v2 and Humanoid-v2.
Dataset Splits	No	While the paper mentions and shows figures related to "validation" (e.g., "early stopping on a validation set" and "Predictive error on validation"), it does not explicitly provide specific details about the dataset splits (percentages, sample counts, or explicit standard split references) used for validation in their experiments, which would be needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not explicitly provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only refers to "Mu Jo Co simulator environments".
Software Dependencies	No	The paper mentions "Mu Jo Co simulator" but does not specify its version number. It does not list any other software dependencies with specific version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiment.
Experiment Setup	Yes	Table 1 shows the hyperparameters used for DDPPO results shown in Figure 1. Environment Name Inverted Pendulum Hopper Walker2D Half Cheetah Ant Humanoid epochs 15 100 100 100 150 150 environment steps /epoch 1000 ensemble size 7 G1 /environment step 10 G2 /environment step 10 H 3 2 3 n 10 25 25 5 w 10 50 0.1 1.0 0.1 0.1