On Efficient Online Imitation Learning via Classification
Authors: Yichen Li, Chicheng Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we study classification-based online imitation learning (abbrev. COIL) and the fundamental feasibility to design oracle-efficient regret-minimization algorithms in this setting, with a focus on the general nonrealizable case. We make the following contributions: (1) we show that in the COIL problem, any proper online learning algorithm cannot guarantee a sublinear regret in general; (2) we propose LOGGER, an improper online learning algorithmic framework, that reduces COIL to online linear optimization, by utilizing a new definition of mixed policy class; (3) we design two oracle-efficient algorithms within the LOGGER framework that enjoy different sample and interaction round complexity tradeoffs, and conduct finite-sample analyses to show their improvements over naive behavior cloning; (4) we show that under the standard complexitytheoretic assumptions, efficient dynamic regret minimization is infeasible in the LOGGER framework. Our work puts classification-based online imitation learning, an important IL setup, into a firmer foundation. |
| Researcher Affiliation | Academia | Yichen Li University of Arizona yichenl@arizona.edu Chicheng Zhang University of Arizona chichengz@cs.arizona.edu |
| Pseudocode | Yes | Algorithm 2 LOGGER: reducing COIL to online linear optimization; Algorithm 3 MFTPL: an oracle-efficient approximation of FTRL |
| Open Source Code | No | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] |
| Open Datasets | No | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]. The paper discusses 'expert demonstrations' and 'samples' but does not provide concrete access information to any public datasets used for training. |
| Dataset Splits | No | 3. If you ran experiments... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]. The paper focuses on theoretical analysis and does not specify data splits for validation. |
| Hardware Specification | No | 3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] |
| Software Dependencies | No | 3. If you ran experiments... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]. The paper focuses on theoretical analysis and does not specify software dependencies with version numbers. |
| Experiment Setup | No | 3. If you ran experiments... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]. The paper focuses on theoretical analysis and does not specify experimental setup details such as hyperparameters. |