Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation
Authors: Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Besides the theoretical analysis, we also conduct an experiment for PGAPI in the offline setting. Results show that PGAPI can converge fast and exceeds the performance of BC method. See Appendix G for detailed discussions and experimental results. |
| Researcher Affiliation | Academia | 1Northwestern University 2Yale University. Correspondence to: Zhihan Liu <zhihanliu2027@u.northwestern.edu>, Yufeng Zhang <yufengzhang2023@u.northwestern.edu>, Zuyue Fu <zuyue.fu@u.northwestern.edu>, Zhuoran Yang <zhuoran.yang@yale.edu>, Zhaoran Wang <zhaoranwang@gmail.com>. |
| Pseudocode | Yes | C. Pseudocode of OGPAI (Algorithm 2) ... D. Pseudocode of PGPAI (Algorithm 3) |
| Open Source Code | Yes | The codes are available on https://github.com/YSLIU627/Adversarial-Policy-Imitation-with-LFA. |
| Open Datasets | No | The paper describes generating its own datasets: 'The expert demonstration DE is obtained by sampling 5 trajectories from πE while the additional dataset DA is collected by sampling 1000 trajectories from the policy which samples uniformly random actions.' No access information for publicly available datasets is provided. |
| Dataset Splits | No | The paper describes the generation of expert and additional datasets but does not specify training, validation, or test splits from these datasets. It evaluates 'optimality gap' and 'average return' directly from the generated data. |
| Hardware Specification | No | The paper describes a simulated environment but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the simulations or experiments. |
| Software Dependencies | No | The paper mentions that PGAPI is implemented according to Algorithm 3 but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We conduct our proposed method PGAPI for 20 iterations and compare the average return of PGAPI with the performance of expert πE, BC method on DE , and BC method on the mixture of DE and DA. |