Efficient Offline Policy Optimization with a Learned Model
Authors: Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive empirical studies with BSuite environments to verify the hypotheses and then run our algorithm on the RL Unplugged Atari benchmark. Experimental results show that our proposed approach achieves stable performance even with an inaccurate learned model. |
| Researcher Affiliation | Collaboration | Sea AI Lab National University of Singapore {liuzc,xuzw}@sea.com {zichen,leews}@comp.nus.edu.sg |
| Pseudocode | Yes | A.1 PSEUDOCODE We present the detailed learning procedure of ROSMO in Algorithm 2. |
| Open Source Code | Yes | Our implementation is open-sourced at https://github.com/sail-sg/rosmo. |
| Open Datasets | Yes | We conduct extensive empirical studies with BSuite environments to verify the hypotheses and then run our algorithm on the RL Unplugged Atari benchmark. The datasets we collected do not contain any sensitive information and will be released. |
| Dataset Splits | No | No, the paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts. It mentions using 'sub-sampled datasets of different fractions' but not specific splits for train/validation/test. |
| Hardware Specification | Yes | We use TPUv3-8 machines for all the experiments in Atari and use CPU servers with 60 cores for BSuite experiments. |
| Software Dependencies | No | No, the paper mentions 'JAX (Bradbury et al., 2018)' but does not provide specific version numbers for JAX or any other software dependencies. |
| Experiment Setup | Yes | The hyperparameters shared by ROSMO and Mu Zero Unplugged for Atari environments is given in Table 3, and that for BSuite environments is given in Table 4. In addition, the behavior regularization strength (α) used in ROSMO is chosen to be 0.2. |