Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation

Authors: Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Besides the theoretical analysis, we also conduct an experiment for PGAPI in the offline setting. Results show that PGAPI can converge fast and exceeds the performance of BC method. See Appendix G for detailed discussions and experimental results.
Researcher Affiliation Academia 1Northwestern University 2Yale University. Correspondence to: Zhihan Liu <zhihanliu2027@u.northwestern.edu>, Yufeng Zhang <yufengzhang2023@u.northwestern.edu>, Zuyue Fu <zuyue.fu@u.northwestern.edu>, Zhuoran Yang <zhuoran.yang@yale.edu>, Zhaoran Wang <zhaoranwang@gmail.com>.
Pseudocode Yes C. Pseudocode of OGPAI (Algorithm 2) ... D. Pseudocode of PGPAI (Algorithm 3)
Open Source Code Yes The codes are available on https://github.com/YSLIU627/Adversarial-Policy-Imitation-with-LFA.
Open Datasets No The paper describes generating its own datasets: 'The expert demonstration DE is obtained by sampling 5 trajectories from πE while the additional dataset DA is collected by sampling 1000 trajectories from the policy which samples uniformly random actions.' No access information for publicly available datasets is provided.
Dataset Splits No The paper describes the generation of expert and additional datasets but does not specify training, validation, or test splits from these datasets. It evaluates 'optimality gap' and 'average return' directly from the generated data.
Hardware Specification No The paper describes a simulated environment but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the simulations or experiments.
Software Dependencies No The paper mentions that PGAPI is implemented according to Algorithm 3 but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We conduct our proposed method PGAPI for 20 iterations and compare the average return of PGAPI with the performance of expert πE, BC method on DE , and BC method on the mixture of DE and DA.