Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Authors: Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We do not include any experiments.
Researcher Affiliation Academia Paria Rashidinejad Department of EECS UC Berkeley Berkeley, CA, 94709 paria.rashidinejad@berkeley.edu; Banghua Zhu Department of EECS UC Berkeley Berkeley, CA, 94709 banghua@berkeley.edu; Cong Ma Department of Statistics University of Chicago Chicago, IL, 60637 congm@uchicago.edu; Jiantao Jiao Department of EECS UC Berkeley Berkeley, CA, 94709 jiantao@berkeley.edu; Stuart Russell Department of EECS UC Berkeley Berkeley, CA, 94709 russell@berkeley.edu
Pseudocode Yes Algorithm 1 LCB for bandits and contextual bandits; Algorithm 2 Offline value iteration with LCB (VI-LCB)
Open Source Code No We do not include any experiments. Our work does not use any assets.
Open Datasets No The paper explicitly states: 'We do not include any experiments.' and 'Our work does not use any assets.', indicating no dataset was used or provided by the authors for their work.
Dataset Splits No We do not include any experiments.
Hardware Specification No We do not include any experiments.
Software Dependencies No We do not include any experiments.
Experiment Setup No We do not include any experiments.