reproducibilityindex.ai

State Regularized Policy Optimization on Data with Dynamics Shift

Authors: Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that SRPO can make several context-based algorithms far more data efficient and significantly improve their overall performance. In this section, we conduct experiments to investigate the following questions: (1) Can SRPO leverage data with distribution shift and outperform current SOTA algorithms in the setting of Hi P-MDP, in both online and offline RL?
Researcher Affiliation	Collaboration	1Nanyang Technology University, Singapore 2Kuaishou Technology 3 Unaffliated
Pseudocode	Yes	Algorithm 1 The workflow of SRPO on top of MAPLE [12].
Open Source Code	No	The paper does not provide an explicit statement or a link indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets	Yes	Then a set of states is sampled from the D4RL [42] dataset and classified into two sets according to the output of Dδ. Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4RL: datasets for deep data-driven reinforcement learning. Co RR, abs/2004.07219, 2020.
Dataset Splits	No	The paper mentions using the D4RL dataset but does not provide specific details on how it was split into training, validation, and test sets for their experiments, or cite a standard split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using the MuJoCo simulator and various RL algorithms (PPO, Ca DM, MAPLE, CQL) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We alter the simulator gravity to generate different dynamics in online experiments. Possible values of gravity are {1.0}, {0.7,1.0,1.3}, and {0.4,0.7,1.0,1.3,1.6} in experiments with 1, 3, and 5 kinds of different dynamics, respectively. We set ρ = 0.5 in offline experiments with medium-expert level of data. ρ = 0.2 is set in all other experiments. λ is regarded as a hyperparameter with values 0.1 or 0.3.