reproducibilityindex.ai

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Authors: Siddarth Venkatraman, Shivesh Khaitan, Ravi Tej Akella, John Dolan, Jeff Schneider, Glen Berseth

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we focus on 1) studying the helpfulness temporal abstraction has in distinguishing latent skills (Section 5.1) 2) evaluating the ability of diffusion models to sample from the latent space (section 5.2 and 5.3) and 3) evaluating the performance of our method in the D4RL offline RL benchmarks (section 5.4).
Researcher Affiliation	Academia	Siddarth Venkatraman1, Shivesh Khaitan2, Ravi Tej Akella2, John Dolan2 Jeff Schneider2 Glen Berseth1 1Mila, Universit e de Montr eal 2Carnegie Mellon University Equal Contribution
Pseudocode	Yes	Algorithm 1 Latent Diffusion-Constrained Q-Learning (LDCQ)
Open Source Code	Yes	The source code is available at: https://github.com/ldcq/ldcq.
Open Datasets	Yes	Our experiments were conducted on the open D4RL benchmark datasets (Fu et al. (2020)).
Dataset Splits	No	The paper mentions using D4RL benchmark datasets, which have predefined splits, but does not explicitly state the dataset split information (e.g., exact percentages or sample counts for train/validation/test) within the text.
Hardware Specification	Yes	The models were trained on NVIDIA RTX A6000.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	In maze2d and Ant Maze tasks we use H = 30, in kitchen tasks we use H = 20 and in locomotion and adroit tasks we use H = 10. We train our diffusion prior with T = 200 diffusion steps. The other hyperparameters which are constant across tasks are provided in the supplemental material.