Pre-training with Synthetic Data Helps Offline Reinforcement Learning
Authors: Zecheng Wang, Che Wang, Zixuan Dong, Keith W. Ross
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments and extensive ablations show that pre-training offline DRL models with simple synthetic datasets can significantly improve performance compared with those with no pre-training, both for transformerand MLP-based backbones, with a low computation overhead. The results also show that large language datasets are not necessary for obtaining performance boosts, which sheds light on what kind of pre-training strategies are critical to improving RL performance and argues for increased usage of pre-training with synthetic data for an easy and consistent performance boost. |
| Researcher Affiliation | Academia | Zecheng Wang1 Che Wang2,4 Zixuan Dong3,4 Keith Ross1 1 New York University Abu Dhabi 2 New York University Shanghai 3 SFSC of AI and DL, NYU Shanghai 4 New York University |
| Pseudocode | No | No explicit pseudocode or algorithm block was found. |
| Open Source Code | No | We develop our code based on the implementation recommended by CQL authors4. Most of the hyperparameters used in the training process or the dataset follow the default setting, and we list them in detail in Table 15 and Table 16. Also, we provide additional implementation and experiment details below. 4https://github.com/young-geng/CQL |
| Open Datasets | Yes | We consider the same three Mu Jo Co environments and D4RL datasets (Fu et al., 2020) considered in Reid et al. (2022) plus the high-dimensional Ant environment, giving a total of 12 datasets. |
| Dataset Splits | No | For each dataset, we fine-tune for 100,000 updates. For DT+Wiki, we perform 80K updates during pre-training following the authors. For DT+Synthetic, however, we found that we can achieve good performance with much fewer pre-training updates, namely, 20K updates. |
| Hardware Specification | Yes | In terms of pre-training computation time, we run both Wikipedia and synthetic pre-training on 2 rtx8000 GPUs. ... All experiments are run on a single rtx8000 GPU with the default settings for 100k updates. |
| Software Dependencies | No | Pre-trained models are trained with the Hugging Face Transformers library (Wolf et al., 2020). We used Adam W optimizer (Loshchilov & Hutter, 2017) for both pre-training and finetuning. Unless mentioned, we followed the default hyperparameter settings from Huggingface and Py Torch. |
| Experiment Setup | Yes | Our hyperparameter choices follow those from Reid et al. (2022) for both pre-training and finetuning, which are shown in detail in table 9 and 10. ... Most of the hyperparameters used in the training process or the dataset follow the default setting, and we list them in detail in Table 15 and Table 16. |