Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning
Authors: Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically validate the effectiveness of EDIS. Sec. 5.1 showcases the considerable performance enhancement achieved by EDIS when integrated with off-the-shelf offline-to-online algorithms. ... We observe a notable 20% average improvement in empirical performance on Mu Jo Co, Ant Maze, and Adroit environments. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, China & School of Artificial Intelligence, Nanjing University, China 2Polixir Technologies 3Department of Computer Science, The University of Hong Kong, Hong Kong, China. |
| Pseudocode | Yes | Algorithm 1 Energy-Guided Diffusion Sampling in Offline-to-Online Reinforcement Learning |
| Open Source Code | Yes | Code is available at https://github.com/liuxhym/EDIS. |
| Open Datasets | Yes | We evaluate the performance of EDIS on three benchmark tasks from D4RL (Fu et al., 2020): Mu Jo Co Locomotion, Ant Maze Navigation, and Adroit Manipulation. |
| Dataset Splits | No | The paper uses standard D4RL benchmarks but does not explicitly detail the specific training/validation/test splits within the text. |
| Hardware Specification | Yes | We train EDIS integrated with base algorithms on an NVIDIA RTX 4090, with approximately 4 hours required for 0.2M fine-tuning on Mu Jo Co Locomotion and Adroit Manipulation, while 6 hours for Ant Maze Navigation. ... Table 7. Computational consumption of different algorithms. ... Maximal GPU memory |
| Software Dependencies | No | We use the Py Torch implementation of Cal-QL and IQL from https://github.com/tinkoff-ai/CORL, and primarily followed the author s recommended parameters (Tarasov et al., 2022). (No version specified for PyTorch). |
| Experiment Setup | Yes | The hyperparameters used in our EDIS module are detailed in the Tab. 3: Table 3. Hyperparameters and their values in EDIS: Network Type (Denoising) Residual MLP, Denoising Network Depth 6 layers, Denoising Steps 128 steps, Denoising Network Learning Rate 3 10 4, Denoising Network Hidden Dimension 1024 units, Denoising Network Batch Size 256 samples, Denoising Network Activation Function ReLU, Denoising Network Optimizer Adam, Learning Rate Schedule (Denoising Network) Cosine Annealing, Training Epochs (Denoising Network) 50,000 epochs, Training Interval Environment Step (Denoising Network) Every 10,000 steps, Energy Network Hidden Dimension 256 units, Negative Samples (Energy Network Training) 10, Energy Network Learning Rate 1 10 3, Energy Network Activation Function ReLU, Energy Network Optimizer Adam. |