On the Value of Interaction and Function Approximation in Imitation Learning
Authors: Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs. [22] show an information theoretic lower bound that in the worst case, a learner which can even actively query the expert policy suffers from a suboptimality growing quadratically in the length of the horizon, H. We show that the reduction proposed by [25] is statistically optimal: the resulting algorithm upon interacting with the MDP for N episodes results in a suboptimality bound of e O (ยต|S|H/N) which we show is optimal up to log-factors. |
| Researcher Affiliation | Academia | Nived Rajaraman University of California, Berkeley nived.rajaraman@berkeley.edu; Yanjun Han University of California, Berkeley yjhan@berkeley.edu; Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu; Jingbo Liu University of Illinois, Urbana-Champaign jingbol@illinois.edu; Jiantao Jiao University of California, Berkeley jiantao@eecs.berkeley.edu; Kannan Ramchandran University of California, Berkeley kannanr@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 MIMIC-MD under linear-expert and linear rewards assumption |
| Open Source Code | No | The paper does not provide any concrete access to source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and discusses 'a dataset D of N trajectories' in a conceptual manner for theoretical analysis, but does not refer to specific public datasets with access information for training. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or specific dataset splits (training, validation, test) for reproduction. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not provide specific ancillary software details with version numbers needed for experimental replication. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with concrete hyperparameter values, training configurations, or system-level settings. |