Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

Authors: Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically validate the effectiveness of EDIS. Sec. 5.1 showcases the considerable performance enhancement achieved by EDIS when integrated with off-the-shelf offline-to-online algorithms. ... We observe a notable 20% average improvement in empirical performance on Mu Jo Co, Ant Maze, and Adroit environments.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, China & School of Artificial Intelligence, Nanjing University, China 2Polixir Technologies 3Department of Computer Science, The University of Hong Kong, Hong Kong, China.
Pseudocode Yes Algorithm 1 Energy-Guided Diffusion Sampling in Offline-to-Online Reinforcement Learning
Open Source Code Yes Code is available at https://github.com/liuxhym/EDIS.
Open Datasets Yes We evaluate the performance of EDIS on three benchmark tasks from D4RL (Fu et al., 2020): Mu Jo Co Locomotion, Ant Maze Navigation, and Adroit Manipulation.
Dataset Splits No The paper uses standard D4RL benchmarks but does not explicitly detail the specific training/validation/test splits within the text.
Hardware Specification Yes We train EDIS integrated with base algorithms on an NVIDIA RTX 4090, with approximately 4 hours required for 0.2M fine-tuning on Mu Jo Co Locomotion and Adroit Manipulation, while 6 hours for Ant Maze Navigation. ... Table 7. Computational consumption of different algorithms. ... Maximal GPU memory
Software Dependencies No We use the Py Torch implementation of Cal-QL and IQL from https://github.com/tinkoff-ai/CORL, and primarily followed the author s recommended parameters (Tarasov et al., 2022). (No version specified for PyTorch).
Experiment Setup Yes The hyperparameters used in our EDIS module are detailed in the Tab. 3: Table 3. Hyperparameters and their values in EDIS: Network Type (Denoising) Residual MLP, Denoising Network Depth 6 layers, Denoising Steps 128 steps, Denoising Network Learning Rate 3 10 4, Denoising Network Hidden Dimension 1024 units, Denoising Network Batch Size 256 samples, Denoising Network Activation Function ReLU, Denoising Network Optimizer Adam, Learning Rate Schedule (Denoising Network) Cosine Annealing, Training Epochs (Denoising Network) 50,000 epochs, Training Interval Environment Step (Denoising Network) Every 10,000 steps, Energy Network Hidden Dimension 256 units, Negative Samples (Energy Network Training) 10, Energy Network Learning Rate 1 10 3, Energy Network Activation Function ReLU, Energy Network Optimizer Adam.