reproducibilityindex.ai

Behavior Proximal Policy Optimization

Authors: Zifeng Zhuang, Kun LEI, Jinxin Liu, Donglin Wang, Yilang Guo

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the D4RL benchmark empirically show this extremely succinct method outperforms state-of-the-art offline RL algorithms.
Researcher Affiliation	Academia	Zifeng Zhuang12 Kun Lei2 Jinxin Liu2 Donglin Wang23 Yilang Guo4 1 Zhejiang University. 2 School of Engineering, Westlake University. 3 Institute of Advanced Technology, Westlake Institute for Advanced Study. 4 School of Software Engineering, Beijing Jiaotong University.
Pseudocode	Yes	Algorithm 1 Behavior Proximal Policy Optimization (BPPO)
Open Source Code	Yes	Our implementation is available at https://github.com/Dragon-Zhuang/BPPO.
Open Datasets	Yes	Extensive experiments on the D4RL benchmark (Fu et al., 2020) empirically shows that BPPO outperforms state-of-the-art offline RL algorithms.
Dataset Splits	No	The paper mentions using the D4RL benchmark but does not explicitly provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit statements of standard splits used).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'Our method is constructed by Pytorch (Paszke et al., 2019)' but does not provide specific version numbers for PyTorch or other ancillary software components.
Experiment Setup	Yes	Table 6: The selections of part of hyperparameters during policy improvement phase. This table provides specific hyperparameter values such as initial policy learning rate, initial clip ratio ϵ, and asymmetric coefficient ω.