reproducibilityindex.ai

Synthetic Experience Replay

Authors: Cong Lu, Philip Ball, Yee Whye Teh, Jack Parker-Holder

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that SYNTHER is an effective method for training RL agents across offline and online settings, in both proprioceptive and pixel-based environments. In offline settings, we observe drastic improvements when upsampling small offline datasets and see that additional synthetic data also allows us to effectively train larger networks. Furthermore, SYNTHER enables online agents to train with a much higher update-to-data ratio than before, leading to a significant increase in sample efficiency, without any algorithmic changes.
Researcher Affiliation	Academia	Cong Lu , Philip J. Ball , Yee Whye Teh, Jack Parker-Holder University of Oxford
Pseudocode	Yes	Algorithm 1 SYNTHER for online replay-based algorithms. Our additions are highlighted in blue.
Open Source Code	Yes	Finally, we open-source our code at https://github.com/conglu1997/Synth ER.
Open Datasets	Yes	We first verify that synthetic samples from SYNTHER faithfully model the underlying distribution from the canonical offline D4RL [21] datasets. To do this, we evaluate SYNTHER in combination with 3 widely-used SOTA offline RL algorithms: TD3+BC (Fujimoto and Gu [22], explicit policy regularization), IQL (Kostrikov et al. [41], expectile regression), and EDAC (An et al. [5], uncertaintybased regularization) on an extensive selection of D4RL datasets. We consider the V-D4RL [50] benchmarking suite, a set of standardized pixel-based offline datasets, and focus on the cheetah-run and walker-walk environments.
Dataset Splits	No	The paper states it uses standard D4RL and V-D4RL datasets and upsamples them to 5M samples for training, but it does not explicitly define specific train/validation/test splits for the upsampled data itself, or how those original dataset splits are specifically used in conjunction with the synthetic data generation process for their models.
Hardware Specification	Yes	On the 200K DMC experiments, SAC (Synth ER) takes 21.1 hours compared to 22.7 hours with REDQ on a V100 GPU.
Software Dependencies	No	The paper mentions several software components like CORL, REDQ, dmcgym wrapper, V-D4RL codebase, and denoising-diffusion-pytorch implementation, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	The hyperparameters are listed in Table 8. The base size of the network uses a width of 1024 and depth of 6 and thus has 6M parameters. We adjust the batch size for training based on dataset size. For online training and offline datasets with fewer than 1 million samples (medium-replay datasets) we use a batch size of 256, and 1024 otherwise. [...] learning rate 3 10 4