reproducibilityindex.ai

Variational Delayed Policy Optimization

Authors: Qingyuan Wu, Simon Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We not only provide a theoretical analysis of VDPO in terms of sample complexity and performance, but also empirically demonstrate that VDPO can achieve consistent performance with SOTA methods, with a significant enhancement of sample efficiency (approximately 50% less amount of samples) in the Mu Jo Co benchmark.
Researcher Affiliation	Academia	Qingyuan Wu University of Southampton Simon Sinong Zhan Northwestern University Yixuan Wang Northwestern University Yuhui Wang King Abdullah University of Science and Technology Chung-Wei Lin National Taiwan University Chen Lv Nanyang Technological University Qi Zhu Northwestern University Chao Huang University of Southampton
Pseudocode	Yes	The pseudocode of VDPO is summarized in Alg. 1
Open Source Code	Yes	Code is available at https://github.com/Qingyuan Wu Nothing/VDPO.
Open Datasets	Yes	We evaluate our VDPO in the Mu Jo Co benchmark [35].
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., exact percentages, sample counts, or detailed splitting methodology) for a validation set.
Hardware Specification	Yes	Each run of VDPO takes approximately 6 hours on 1 NVIDIA A100 GPU and 8 Intel Xeon CPUs.
Software Dependencies	No	The implementation of VDPO is based on Clean RL [16], and we also provide the code and guidelines to reproduce our results in the supplemental material.
Experiment Setup	Yes	The setting of hyper-parameters is presented in Appendix A. We investigate the sample efficiency (Sec. 4.2.1) followed by performance comparison under different settings of delays (Sec. 4.2.2). We also conduct the ablation study on the representation of VDPO (Sec. 4.2.3). Each method was run over 10 random seeds. The training curves can be found in the Appendix E.