Reasoning with Latent Diffusion in Offline Reinforcement Learning
Authors: Siddarth Venkatraman, Shivesh Khaitan, Ravi Tej Akella, John Dolan, Jeff Schneider, Glen Berseth
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we focus on 1) studying the helpfulness temporal abstraction has in distinguishing latent skills (Section 5.1) 2) evaluating the ability of diffusion models to sample from the latent space (section 5.2 and 5.3) and 3) evaluating the performance of our method in the D4RL offline RL benchmarks (section 5.4). |
| Researcher Affiliation | Academia | Siddarth Venkatraman1, Shivesh Khaitan2, Ravi Tej Akella2, John Dolan2 Jeff Schneider2 Glen Berseth1 1Mila, Universit e de Montr eal 2Carnegie Mellon University Equal Contribution |
| Pseudocode | Yes | Algorithm 1 Latent Diffusion-Constrained Q-Learning (LDCQ) |
| Open Source Code | Yes | The source code is available at: https://github.com/ldcq/ldcq. |
| Open Datasets | Yes | Our experiments were conducted on the open D4RL benchmark datasets (Fu et al. (2020)). |
| Dataset Splits | No | The paper mentions using D4RL benchmark datasets, which have predefined splits, but does not explicitly state the dataset split information (e.g., exact percentages or sample counts for train/validation/test) within the text. |
| Hardware Specification | Yes | The models were trained on NVIDIA RTX A6000. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | In maze2d and Ant Maze tasks we use H = 30, in kitchen tasks we use H = 20 and in locomotion and adroit tasks we use H = 10. We train our diffusion prior with T = 200 diffusion steps. The other hyperparameters which are constant across tasks are provided in the supplemental material. |