reproducibilityindex.ai

Efficient Diffusion Policies For Offline Reinforcement Learning

Authors: Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, Shuicheng Yan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on the D4RL benchmark. The results show that EDP can reduce the diffusion policy training time from 5 days to 5 hours on gym-locomotion tasks. Moreover, we show that EDP is compatible with various offline RL algorithms (TD3, CRR, and IQL) and achieves new state-of-the-art on D4RL by large margins over previous methods.
Researcher Affiliation	Industry	Bingyi Kang Xiao Ma Chao Du Tianyu Pang Shuicheng Yan Sea AI Lab {bingykang,yusufma555,duchao0726}@gmail.com {tianyupang,yansc}@sea.com
Pseudocode	Yes	The overall algorithm for our Reinforcement Guided Diffusion Policy Learning is given in Alg. 1. The detailed algorithm for energy-based action selection is given in Alg. 2.
Open Source Code	Yes	Our code is available at https://github.com/sail-sg/edp.
Open Datasets	Yes	We conduct extensive experiments on the D4RL benchmark [2]
Dataset Splits	No	The paper mentions training and evaluation but does not explicitly provide the specific training/validation/test split percentages or sample counts used for reproduction. It refers to the D4RL benchmark but does not detail how data was partitioned.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using "Adam" for optimization and "PyTorch" for implementation, but it does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	We keep the backbone network architecture the same for all tasks and algorithms, which is a 3-layer MLP (hidden size 256) with Mish [23] activation function... The models are trained for 2000 epochs on Gym-locomotion and 1000 epochs on the other three domains. Each epoch consists of 1000 iterations of policy updates with batch size 256. For DPM-Solver [20], we use the third-order version and set the model call steps to 15. We defer the complete list of all hyperparameters to the appendix due to space limits.