reproducibilityindex.ai

Diffusion Policies Creating a Trust Region for Offline Reinforcement Learning

Authors: Tianyu Chen, Zhendong Wang, Mingyuan Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate its effectiveness and algorithmic characteristics against popular Kullback Leibler divergence-based distillation methods in 2D bandit scenarios and gym tasks. We then show that DTQL could not only outperform other methods on the majority of the D4RL benchmark tasks but also demonstrate efficiency in training and inference speeds. In this section, we evaluate our method using the popular D4RL benchmark [Fu et al., 2020]. We further compare our training and inference efficiency against other baseline methods. Additionally, an ablation study on the negative log likelihood (NLL) term and one-step policy choice is presented.
Researcher Affiliation	Academia	Tianyu Chen Zhendong Wang Mingyuan Zhou The University of Texas at Austin {tianyuchen, zhendong.wang}@utexas.edu mingyuan.zhou@mccombs.utexas.edu
Pseudocode	Yes	We summarize our algorithm in Algorithm 1.
Open Source Code	Yes	The PyTorch implementation is available at https://github.com/TianyuCodings/Diffusion_Trusted_Q_Learning.
Open Datasets	Yes	In this section, we evaluate our method using the popular D4RL benchmark [Fu et al., 2020].
Dataset Splits	No	The paper mentions using a 'static dataset D' and evaluating on D4RL benchmarks, which typically have predefined splits. However, it does not explicitly state specific training/test/validation split percentages or sample counts within the paper's text.
Hardware Specification	Yes	All experiments were performed on a server equipped with eight RTXA5000 GPUs, each with 24GB of memory.
Software Dependencies	No	The paper mentions 'PyTorch implementation' and 'Adam' optimizer, but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	Hyperparameters: In D4RL benchmarks, for all Antmaze tasks, we incorporate an NLL term, while for other tasks, this term is omitted. Additionally, we adjust the parameter α for different tasks. Details on hyperparameters and implementation are provided in Appendices D and E. Table 4: Hyperparameters for D4RL benchmarks. One epoch represents 1k steps, and the optimizer used is Adam.