Synthetic Experience Replay

Authors: Cong Lu, Philip Ball, Yee Whye Teh, Jack Parker-Holder

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that SYNTHER is an effective method for training RL agents across offline and online settings, in both proprioceptive and pixel-based environments. In offline settings, we observe drastic improvements when upsampling small offline datasets and see that additional synthetic data also allows us to effectively train larger networks. Furthermore, SYNTHER enables online agents to train with a much higher update-to-data ratio than before, leading to a significant increase in sample efficiency, without any algorithmic changes.
Researcher Affiliation Academia Cong Lu , Philip J. Ball , Yee Whye Teh, Jack Parker-Holder University of Oxford
Pseudocode Yes Algorithm 1 SYNTHER for online replay-based algorithms. Our additions are highlighted in blue.
Open Source Code Yes Finally, we open-source our code at https://github.com/conglu1997/Synth ER.
Open Datasets Yes We first verify that synthetic samples from SYNTHER faithfully model the underlying distribution from the canonical offline D4RL [21] datasets. To do this, we evaluate SYNTHER in combination with 3 widely-used SOTA offline RL algorithms: TD3+BC (Fujimoto and Gu [22], explicit policy regularization), IQL (Kostrikov et al. [41], expectile regression), and EDAC (An et al. [5], uncertaintybased regularization) on an extensive selection of D4RL datasets. We consider the V-D4RL [50] benchmarking suite, a set of standardized pixel-based offline datasets, and focus on the cheetah-run and walker-walk environments.
Dataset Splits No The paper states it uses standard D4RL and V-D4RL datasets and upsamples them to 5M samples for training, but it does not explicitly define specific train/validation/test splits for the upsampled data itself, or how those original dataset splits are specifically used in conjunction with the synthetic data generation process for their models.
Hardware Specification Yes On the 200K DMC experiments, SAC (Synth ER) takes 21.1 hours compared to 22.7 hours with REDQ on a V100 GPU.
Software Dependencies No The paper mentions several software components like CORL, REDQ, dmcgym wrapper, V-D4RL codebase, and denoising-diffusion-pytorch implementation, but does not provide specific version numbers for any of them.
Experiment Setup Yes The hyperparameters are listed in Table 8. The base size of the network uses a width of 1024 and depth of 6 and thus has 6M parameters. We adjust the batch size for training based on dataset size. For online training and offline datasets with fewer than 1 million samples (medium-replay datasets) we use a batch size of 256, and 1024 otherwise. [...] learning rate 3 10 4