reproducibilityindex.ai

Efficient Offline Policy Optimization with a Learned Model

Authors: Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive empirical studies with BSuite environments to verify the hypotheses and then run our algorithm on the RL Unplugged Atari benchmark. Experimental results show that our proposed approach achieves stable performance even with an inaccurate learned model.
Researcher Affiliation	Collaboration	Sea AI Lab National University of Singapore {liuzc,xuzw}@sea.com {zichen,leews}@comp.nus.edu.sg
Pseudocode	Yes	A.1 PSEUDOCODE We present the detailed learning procedure of ROSMO in Algorithm 2.
Open Source Code	Yes	Our implementation is open-sourced at https://github.com/sail-sg/rosmo.
Open Datasets	Yes	We conduct extensive empirical studies with BSuite environments to verify the hypotheses and then run our algorithm on the RL Unplugged Atari benchmark. The datasets we collected do not contain any sensitive information and will be released.
Dataset Splits	No	No, the paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts. It mentions using 'sub-sampled datasets of different fractions' but not specific splits for train/validation/test.
Hardware Specification	Yes	We use TPUv3-8 machines for all the experiments in Atari and use CPU servers with 60 cores for BSuite experiments.
Software Dependencies	No	No, the paper mentions 'JAX (Bradbury et al., 2018)' but does not provide specific version numbers for JAX or any other software dependencies.
Experiment Setup	Yes	The hyperparameters shared by ROSMO and Mu Zero Unplugged for Atari environments is given in Table 3, and that for BSuite environments is given in Table 4. In addition, the behavior regularization strength (α) used in ROSMO is chosen to be 0.2.