reproducibilityindex.ai

Robust Reinforcement Learning using Offline Data

Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose a robust RL algorithm called Robust Fitted Q-Iteration (RFQI)... We prove that RFQI learns a near-optimal robust policy under standard assumptions and demonstrate its superior performance on standard benchmark problems.
Researcher Affiliation	Collaboration	Kishan Panaganti1, Zaiyan Xu1, Dileep Kalathil1, Mohammad Ghavamzadeh2 1Texas A&M University, 2Google Research. Emails: {kpb, zxu43, dileep.kalathil}@tamu.edu, ghavamza@google.com
Pseudocode	Yes	Algorithm 1 Robust Fitted Q-Iteration (RFQI) Algorithm
Open Source Code	Yes	We provide our code in github webpage https: //github.com/zaiyan-x/RFQI containing instructions to reproduce all results in this paper.
Open Datasets	Yes	Here, we demonstrate the robust performance of our RFQI algorithm by evaluating it on Cartpole and Hopper environments in Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper states it uses an 'offline dataset D' but does not provide specific details on how this dataset is split into training, validation, or test sets, nor does it specify any cross-validation setup.
Hardware Specification	Yes	All experiments were performed on a single machine with NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using the 'stable-baselines3 library' but does not provide specific version numbers for this or any other software dependencies like Python or deep learning frameworks.
Experiment Setup	Yes	We use a feedforward neural network with 2 hidden layers and 256 neurons in each layer. We use the Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.0003 and a batch size of 256. For the Cartpole, we train for 1000 epochs. For the Hopper, we train for 100 epochs. We use the discount factor γ = 0.99 for all experiments. The radius of the uncertainty set is set to ρ = 0.1 for Cartpole experiments and ρ = 0.5 for Hopper experiments.