Proximal Point Imitation Learning
Authors: Luca Viano, Angeliki Kamoutsi, Gergely Neu, Igor Krawczuk, Volkan Cevher
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offline IL, respectively. Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature. In particular, we do away with the conventional alternating updates by the optimization of a single convex and smooth objective over both cost and Q-functions. When solved inexactly, we relate the optimization errors to the suboptimality of the recovered policy. As an added bonus, by re-interpreting PPM as dual smoothing with the expert policy as a center point, we also obtain an offline IL algorithm enjoying theoretical guarantees in terms of required expert trajectories. Finally, we achieve convincing empirical performance for both linear and neural network function approximation. |
| Researcher Affiliation | Academia | Luca Viano LIONS, EPFL Lausanne, Switzerland luca.viano@epfl.ch Angeliki Kamoutsi ETH Zurich Zurich, Switzerland kamoutsa@ethz.ch Gergely Neu Universitat Pompeu Fabra Barcelona, Spain gergely.neu@gmail.com Igor Krawczuk LIONS, EPFL Lausanne, Switzerland igor.krawczuk@epfl.ch Volkan Cevher LIONS, EPFL Lausanne, Switzerland volkan.cevher@epfl.ch |
| Pseudocode | Yes | Algorithm 1 Proximal Point Imitation Learning: P2IL(Φ, DE, K, η, α) |
| Open Source Code | Yes | The code is available at the following link https://github.com/lviano/P2IL. |
| Open Datasets | No | The paper mentions using 'expert demonstrations DE' sampled from an expert policy in various environments (e.g., River Swim, Cart Pole, Pong, Mu Jo Co). While these environments are known, the specific expert data generated for the experiments is not provided with concrete access information (link, DOI, specific citation for the data itself, or clear instructions for exact data reproduction beyond general sampling). |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits, such as percentages or specific counts for a validation set. |
| Hardware Specification | No | The main body of the paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types). The checklist mentions resources are specified in the Supplementary Material, but this information is not present in the provided text. |
| Software Dependencies | No | The main body of the paper does not explicitly list specific software dependencies with version numbers. The checklist mentions training details are in the Appendix, but specific software versions are not provided in the given text. |
| Experiment Setup | Yes | Algorithm 1 (P2IL) lists inputs such as 'number of iterations K, step sizes η and α, number of SGD iterations T, SGD learning rates β = {βt}T 1 t=0 , number-of-samples function n : N N'. Additionally, Section 5 states 'The precise setting is detailed in Appendix L.' and the checklist confirms 'Training details are provided in the Appendix.' |