ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
Authors: Qianlan Yang, Yu-Xiong Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation shows that ATra Diff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io. ... We conduct comprehensive experiments to evaluate the effectiveness of our data generator ATra Diff. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA. |
| Pseudocode | Yes | Algorithm 1 Modified Replay Buffer for RL |
| Open Source Code | Yes | Our code and demo video are available at https://atradiff.github.io. |
| Open Datasets | Yes | We consider 3 environments from D4RL Locomotion (Fu et al., 2020)... D4RL Ant Maze (Fu et al., 2020)... D4RL Kitchen (Fu et al., 2020)... Meta-World (Yu et al., 2019). |
| Dataset Splits | Yes | We consider 3 environments from D4RL Locomotion (Fu et al., 2020)... For evaluation, all results in this section are presented as the median performance over 5 random seeds along with the 25%-75% percentiles. |
| Hardware Specification | Yes | The experiments are conducted on a single NVIDIA RTX 4090TI GPU. |
| Software Dependencies | No | The paper mentions using specific models like Stable Diffusion and SAC/REDQ as baselines, but it does not specify the version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We run both baselines and their variants combined with our ATra Diff for 250K steps. ... For evaluation, all results in this section are presented as the median performance over 5 random seeds along with the 25%-75% percentiles. ... This modified replay buffer D is characterized by two hyperparameters: ρ [0, 1], denoting the probability of sampling from synthesized data Ds in RL, and L N, indicating the expected length of synthesized trajectories. |