Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the Value of Interaction and Function Approximation in Imitation Learning
Authors: Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs. [22] show an information theoretic lower bound that in the worst case, a learner which can even actively query the expert policy suffers from a suboptimality growing quadratically in the length of the horizon, H. We show that the reduction proposed by [25] is statistically optimal: the resulting algorithm upon interacting with the MDP for N episodes results in a suboptimality bound of e O (ยต|S|H/N) which we show is optimal up to log-factors. |
| Researcher Affiliation | Academia | Nived Rajaraman University of California, Berkeley EMAIL; Yanjun Han University of California, Berkeley EMAIL; Lin F. Yang University of California, Los Angeles EMAIL; Jingbo Liu University of Illinois, Urbana-Champaign EMAIL; Jiantao Jiao University of California, Berkeley EMAIL; Kannan Ramchandran University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 MIMIC-MD under linear-expert and linear rewards assumption |
| Open Source Code | No | The paper does not provide any concrete access to source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and discusses 'a dataset D of N trajectories' in a conceptual manner for theoretical analysis, but does not refer to specific public datasets with access information for training. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or specific dataset splits (training, validation, test) for reproduction. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not provide specific ancillary software details with version numbers needed for experimental replication. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with concrete hyperparameter values, training configurations, or system-level settings. |