reproducibilityindex.ai

ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

Authors: Qianlan Yang, Yu-Xiong Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation shows that ATra Diff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io. ... We conduct comprehensive experiments to evaluate the effectiveness of our data generator ATra Diff.
Researcher Affiliation	Academia	1Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA.
Pseudocode	Yes	Algorithm 1 Modified Replay Buffer for RL
Open Source Code	Yes	Our code and demo video are available at https://atradiff.github.io.
Open Datasets	Yes	We consider 3 environments from D4RL Locomotion (Fu et al., 2020)... D4RL Ant Maze (Fu et al., 2020)... D4RL Kitchen (Fu et al., 2020)... Meta-World (Yu et al., 2019).
Dataset Splits	Yes	We consider 3 environments from D4RL Locomotion (Fu et al., 2020)... For evaluation, all results in this section are presented as the median performance over 5 random seeds along with the 25%-75% percentiles.
Hardware Specification	Yes	The experiments are conducted on a single NVIDIA RTX 4090TI GPU.
Software Dependencies	No	The paper mentions using specific models like Stable Diffusion and SAC/REDQ as baselines, but it does not specify the version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We run both baselines and their variants combined with our ATra Diff for 250K steps. ... For evaluation, all results in this section are presented as the median performance over 5 random seeds along with the 25%-75% percentiles. ... This modified replay buffer D is characterized by two hyperparameters: ρ [0, 1], denoting the probability of sampling from synthesized data Ds in RL, and L N, indicating the expected length of synthesized trajectories.