reproducibilityindex.ai

Off-Policy Proximal Policy Optimization

Authors: Wenjia Meng, Qian Zheng, Gang Pan, Yilong Yin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the experimental results on representative continuous control tasks validate that our method outperforms the state-of-the-art methods on most tasks.
Researcher Affiliation	Academia	Wenjia Meng1, Qian Zheng2,3, Gang Pan2,3, Yilong Yin1 1 School of Software, Shandong University, Jinan, China 2 The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 3 College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Pseudocode	Yes	Algorithm 1: Off-Policy PPO
Open Source Code	No	The paper mentions using existing open-source implementations for comparative methods but does not provide a link or explicit statement for the open-source code of their proposed Off-Policy PPO method.
Open Datasets	Yes	Experimental tasks consist of six representative continuous control tasks from Open AI Gym (Brockman et al. 2016) and Mu Jo Co (Todorov, Erez, and Tassa 2012), which cover simple and complex tasks: Swimmer, Hopper, Half Cheetah, Walker2d, Ant, and Humanoid.
Dataset Splits	No	The paper describes collecting transitions and sampling off-policy data for training and updating networks, but does not explicitly mention or specify a validation dataset split.
Hardware Specification	Yes	The experiments are performed on a GPU server that has four Nvidia RTX 3090.
Software Dependencies	No	The paper mentions using the Adam optimizer and the ChainerRL implementation for DDPG, but does not provide specific version numbers for these or other key software components or libraries.
Experiment Setup	Yes	For hyperparameters, the trace-decay parameter λ is 0.95 and the discount factor γ is 0.99. The length of transitions (K) is set to be 1024. We use the Adam optimizer with learning rate α = 3 10 4. The epoch number N is 10. The minibatch size M is set to be 32.