ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

Authors: Qianlan Yang, Yu-Xiong Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation shows that ATra Diff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io. ... We conduct comprehensive experiments to evaluate the effectiveness of our data generator ATra Diff.
Researcher Affiliation Academia 1Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA.
Pseudocode Yes Algorithm 1 Modified Replay Buffer for RL
Open Source Code Yes Our code and demo video are available at https://atradiff.github.io.
Open Datasets Yes We consider 3 environments from D4RL Locomotion (Fu et al., 2020)... D4RL Ant Maze (Fu et al., 2020)... D4RL Kitchen (Fu et al., 2020)... Meta-World (Yu et al., 2019).
Dataset Splits Yes We consider 3 environments from D4RL Locomotion (Fu et al., 2020)... For evaluation, all results in this section are presented as the median performance over 5 random seeds along with the 25%-75% percentiles.
Hardware Specification Yes The experiments are conducted on a single NVIDIA RTX 4090TI GPU.
Software Dependencies No The paper mentions using specific models like Stable Diffusion and SAC/REDQ as baselines, but it does not specify the version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We run both baselines and their variants combined with our ATra Diff for 250K steps. ... For evaluation, all results in this section are presented as the median performance over 5 random seeds along with the 25%-75% percentiles. ... This modified replay buffer D is characterized by two hyperparameters: ρ [0, 1], denoting the probability of sampling from synthesized data Ds in RL, and L N, indicating the expected length of synthesized trajectories.