reproducibilityindex.ai

On the Value of Interaction and Function Approximation in Imitation Learning

Authors: Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs. [22] show an information theoretic lower bound that in the worst case, a learner which can even actively query the expert policy suffers from a suboptimality growing quadratically in the length of the horizon, H. We show that the reduction proposed by [25] is statistically optimal: the resulting algorithm upon interacting with the MDP for N episodes results in a suboptimality bound of e O (µ\|S\|H/N) which we show is optimal up to log-factors.
Researcher Affiliation	Academia	Nived Rajaraman University of California, Berkeley nived.rajaraman@berkeley.edu; Yanjun Han University of California, Berkeley yjhan@berkeley.edu; Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu; Jingbo Liu University of Illinois, Urbana-Champaign jingbol@illinois.edu; Jiantao Jiao University of California, Berkeley jiantao@eecs.berkeley.edu; Kannan Ramchandran University of California, Berkeley kannanr@eecs.berkeley.edu
Pseudocode	Yes	Algorithm 1 MIMIC-MD under linear-expert and linear rewards assumption
Open Source Code	No	The paper does not provide any concrete access to source code for the described methodology.
Open Datasets	No	The paper is theoretical and discusses 'a dataset D of N trajectories' in a conceptual manner for theoretical analysis, but does not refer to specific public datasets with access information for training.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments or specific dataset splits (training, validation, test) for reproduction.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not provide specific ancillary software details with version numbers needed for experimental replication.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with concrete hyperparameter values, training configurations, or system-level settings.