reproducibilityindex.ai

Pre-training with Synthetic Data Helps Offline Reinforcement Learning

Authors: Zecheng Wang, Che Wang, Zixuan Dong, Keith W. Ross

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments and extensive ablations show that pre-training offline DRL models with simple synthetic datasets can significantly improve performance compared with those with no pre-training, both for transformerand MLP-based backbones, with a low computation overhead. The results also show that large language datasets are not necessary for obtaining performance boosts, which sheds light on what kind of pre-training strategies are critical to improving RL performance and argues for increased usage of pre-training with synthetic data for an easy and consistent performance boost.
Researcher Affiliation	Academia	Zecheng Wang1 Che Wang2,4 Zixuan Dong3,4 Keith Ross1 1 New York University Abu Dhabi 2 New York University Shanghai 3 SFSC of AI and DL, NYU Shanghai 4 New York University
Pseudocode	No	No explicit pseudocode or algorithm block was found.
Open Source Code	No	We develop our code based on the implementation recommended by CQL authors4. Most of the hyperparameters used in the training process or the dataset follow the default setting, and we list them in detail in Table 15 and Table 16. Also, we provide additional implementation and experiment details below. 4https://github.com/young-geng/CQL
Open Datasets	Yes	We consider the same three Mu Jo Co environments and D4RL datasets (Fu et al., 2020) considered in Reid et al. (2022) plus the high-dimensional Ant environment, giving a total of 12 datasets.
Dataset Splits	No	For each dataset, we fine-tune for 100,000 updates. For DT+Wiki, we perform 80K updates during pre-training following the authors. For DT+Synthetic, however, we found that we can achieve good performance with much fewer pre-training updates, namely, 20K updates.
Hardware Specification	Yes	In terms of pre-training computation time, we run both Wikipedia and synthetic pre-training on 2 rtx8000 GPUs. ... All experiments are run on a single rtx8000 GPU with the default settings for 100k updates.
Software Dependencies	No	Pre-trained models are trained with the Hugging Face Transformers library (Wolf et al., 2020). We used Adam W optimizer (Loshchilov & Hutter, 2017) for both pre-training and finetuning. Unless mentioned, we followed the default hyperparameter settings from Huggingface and Py Torch.
Experiment Setup	Yes	Our hyperparameter choices follow those from Reid et al. (2022) for both pre-training and finetuning, which are shown in detail in table 9 and 10. ... Most of the hyperparameters used in the training process or the dataset follow the default setting, and we list them in detail in Table 15 and Table 16.