Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Authors: Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We do not include any experiments. |
| Researcher Affiliation | Academia | Paria Rashidinejad Department of EECS UC Berkeley Berkeley, CA, 94709 paria.rashidinejad@berkeley.edu; Banghua Zhu Department of EECS UC Berkeley Berkeley, CA, 94709 banghua@berkeley.edu; Cong Ma Department of Statistics University of Chicago Chicago, IL, 60637 congm@uchicago.edu; Jiantao Jiao Department of EECS UC Berkeley Berkeley, CA, 94709 jiantao@berkeley.edu; Stuart Russell Department of EECS UC Berkeley Berkeley, CA, 94709 russell@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 LCB for bandits and contextual bandits; Algorithm 2 Offline value iteration with LCB (VI-LCB) |
| Open Source Code | No | We do not include any experiments. Our work does not use any assets. |
| Open Datasets | No | The paper explicitly states: 'We do not include any experiments.' and 'Our work does not use any assets.', indicating no dataset was used or provided by the authors for their work. |
| Dataset Splits | No | We do not include any experiments. |
| Hardware Specification | No | We do not include any experiments. |
| Software Dependencies | No | We do not include any experiments. |
| Experiment Setup | No | We do not include any experiments. |