reproducibilityindex.ai

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Authors: Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that RORL can achieve state-of-the-art performance on the general ofﬂine RL benchmark and is considerably robust to adversarial observation perturbations. In our experiments 3, we demonstrate that RORL can achieve state-of-the-art (SOTA) performance in the D4RL benchmark [12] with fewer ensemble Q networks than the current SOTA approach [2].
Researcher Affiliation	Collaboration	Rui Yang1 , Chenjia Bai2 , Xiaoteng Ma3, Zhaoran Wang4, Chongjie Zhang3, Lei Han5 1Hong Kong University of Science and Technology, 2Shanghai AI Laboratory 3Tsinghua University, 4Northwestern University, 5Tencent Robotics X
Pseudocode	Yes	Figure 3: RORL Algorithm: RORL trains multiple Q-functions for uncertainty quantiﬁcation. The conservative smoothing loss is calculated for (ˆs, a) with perturbed states. We perform uncertainty penalization for (ˆs, ˆa) with perturbed states and OOD actions. Algorithm 1: RORL Algorithm
Open Source Code	Yes	Our code is available at https://github.com/YangRui2015/RORL
Open Datasets	Yes	We evaluate our method on the D4RL benchmark [12] with various continuous-control tasks and datasets. ...We cited D4RL [12] and EDAC[2] for their datasets and code.
Dataset Splits	No	The paper evaluates on D4RL datasets and refers to an appendix for more hyper-parameters and implementation details, but it does not explicitly state dataset splits (e.g., percentages or counts for train, validation, and test sets) in the provided text.
Hardware Specification	Yes	We compare the computational cost of RORL with prior works on a single machine with one GPU (Tesla V100 32G).
Software Dependencies	No	The paper mentions using a '2-layer MLP' and 'Adam optimizer' but does not specify version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	For benchmark experiments, we set small perturbation scales P, Q, and ood within {0.001, 0.005, 0.01} when training RORL and do not include observation perturbation in the testing time. We set the learning rates to 3e-4. The network structures are 2-layer MLP with 256 hidden units. We use Adam optimizer for all networks.