On the Value of Interaction and Function Approximation in Imitation Learning

Authors: Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs. [22] show an information theoretic lower bound that in the worst case, a learner which can even actively query the expert policy suffers from a suboptimality growing quadratically in the length of the horizon, H. We show that the reduction proposed by [25] is statistically optimal: the resulting algorithm upon interacting with the MDP for N episodes results in a suboptimality bound of e O (ยต|S|H/N) which we show is optimal up to log-factors.
Researcher Affiliation Academia Nived Rajaraman University of California, Berkeley nived.rajaraman@berkeley.edu; Yanjun Han University of California, Berkeley yjhan@berkeley.edu; Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu; Jingbo Liu University of Illinois, Urbana-Champaign jingbol@illinois.edu; Jiantao Jiao University of California, Berkeley jiantao@eecs.berkeley.edu; Kannan Ramchandran University of California, Berkeley kannanr@eecs.berkeley.edu
Pseudocode Yes Algorithm 1 MIMIC-MD under linear-expert and linear rewards assumption
Open Source Code No The paper does not provide any concrete access to source code for the described methodology.
Open Datasets No The paper is theoretical and discusses 'a dataset D of N trajectories' in a conceptual manner for theoretical analysis, but does not refer to specific public datasets with access information for training.
Dataset Splits No The paper is theoretical and does not describe empirical experiments or specific dataset splits (training, validation, test) for reproduction.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not provide specific ancillary software details with version numbers needed for experimental replication.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with concrete hyperparameter values, training configurations, or system-level settings.