Robust Reinforcement Learning using Offline Data

Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we propose a robust RL algorithm called Robust Fitted Q-Iteration (RFQI)... We prove that RFQI learns a near-optimal robust policy under standard assumptions and demonstrate its superior performance on standard benchmark problems.
Researcher Affiliation Collaboration Kishan Panaganti1, Zaiyan Xu1, Dileep Kalathil1, Mohammad Ghavamzadeh2 1Texas A&M University, 2Google Research. Emails: {kpb, zxu43, dileep.kalathil}@tamu.edu, ghavamza@google.com
Pseudocode Yes Algorithm 1 Robust Fitted Q-Iteration (RFQI) Algorithm
Open Source Code Yes We provide our code in github webpage https: //github.com/zaiyan-x/RFQI containing instructions to reproduce all results in this paper.
Open Datasets Yes Here, we demonstrate the robust performance of our RFQI algorithm by evaluating it on Cartpole and Hopper environments in Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper states it uses an 'offline dataset D' but does not provide specific details on how this dataset is split into training, validation, or test sets, nor does it specify any cross-validation setup.
Hardware Specification Yes All experiments were performed on a single machine with NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using the 'stable-baselines3 library' but does not provide specific version numbers for this or any other software dependencies like Python or deep learning frameworks.
Experiment Setup Yes We use a feedforward neural network with 2 hidden layers and 256 neurons in each layer. We use the Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.0003 and a batch size of 256. For the Cartpole, we train for 1000 epochs. For the Hopper, we train for 100 epochs. We use the discount factor γ = 0.99 for all experiments. The radius of the uncertainty set is set to ρ = 0.1 for Cartpole experiments and ρ = 0.5 for Hopper experiments.