Off-Policy Proximal Policy Optimization

Authors: Wenjia Meng, Qian Zheng, Gang Pan, Yilong Yin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, the experimental results on representative continuous control tasks validate that our method outperforms the state-of-the-art methods on most tasks.
Researcher Affiliation Academia Wenjia Meng1, Qian Zheng2,3, Gang Pan2,3, Yilong Yin1 1 School of Software, Shandong University, Jinan, China 2 The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 3 College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Pseudocode Yes Algorithm 1: Off-Policy PPO
Open Source Code No The paper mentions using existing open-source implementations for comparative methods but does not provide a link or explicit statement for the open-source code of their proposed Off-Policy PPO method.
Open Datasets Yes Experimental tasks consist of six representative continuous control tasks from Open AI Gym (Brockman et al. 2016) and Mu Jo Co (Todorov, Erez, and Tassa 2012), which cover simple and complex tasks: Swimmer, Hopper, Half Cheetah, Walker2d, Ant, and Humanoid.
Dataset Splits No The paper describes collecting transitions and sampling off-policy data for training and updating networks, but does not explicitly mention or specify a validation dataset split.
Hardware Specification Yes The experiments are performed on a GPU server that has four Nvidia RTX 3090.
Software Dependencies No The paper mentions using the Adam optimizer and the ChainerRL implementation for DDPG, but does not provide specific version numbers for these or other key software components or libraries.
Experiment Setup Yes For hyperparameters, the trace-decay parameter λ is 0.95 and the discount factor γ is 0.99. The length of transitions (K) is set to be 1024. We use the Adam optimizer with learning rate α = 3 10 4. The epoch number N is 10. The minibatch size M is set to be 32.