Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
Authors: Qianlan Yang, Yu-Xiong Wang
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation shows that ATra Diff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io. ... We conduct comprehensive experiments to evaluate the effectiveness of our data generator ATra Diff. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA. |
| Pseudocode | Yes | Algorithm 1 Modified Replay Buffer for RL |
| Open Source Code | Yes | Our code and demo video are available at https://atradiff.github.io. |
| Open Datasets | Yes | We consider 3 environments from D4RL Locomotion (Fu et al., 2020)... D4RL Ant Maze (Fu et al., 2020)... D4RL Kitchen (Fu et al., 2020)... Meta-World (Yu et al., 2019). |
| Dataset Splits | Yes | We consider 3 environments from D4RL Locomotion (Fu et al., 2020)... For evaluation, all results in this section are presented as the median performance over 5 random seeds along with the 25%-75% percentiles. |
| Hardware Specification | Yes | The experiments are conducted on a single NVIDIA RTX 4090TI GPU. |
| Software Dependencies | No | The paper mentions using specific models like Stable Diffusion and SAC/REDQ as baselines, but it does not specify the version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We run both baselines and their variants combined with our ATra Diff for 250K steps. ... For evaluation, all results in this section are presented as the median performance over 5 random seeds along with the 25%-75% percentiles. ... This modified replay buffer D is characterized by two hyperparameters: ρ [0, 1], denoting the probability of sampling from synthesized data Ds in RL, and L N, indicating the expected length of synthesized trajectories. |