reproducibilityindex.ai

Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation

Authors: Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Besides the theoretical analysis, we also conduct an experiment for PGAPI in the offline setting. Results show that PGAPI can converge fast and exceeds the performance of BC method. See Appendix G for detailed discussions and experimental results.
Researcher Affiliation	Academia	1Northwestern University 2Yale University. Correspondence to: Zhihan Liu <zhihanliu2027@u.northwestern.edu>, Yufeng Zhang <yufengzhang2023@u.northwestern.edu>, Zuyue Fu <zuyue.fu@u.northwestern.edu>, Zhuoran Yang <zhuoran.yang@yale.edu>, Zhaoran Wang <zhaoranwang@gmail.com>.
Pseudocode	Yes	C. Pseudocode of OGPAI (Algorithm 2) ... D. Pseudocode of PGPAI (Algorithm 3)
Open Source Code	Yes	The codes are available on https://github.com/YSLIU627/Adversarial-Policy-Imitation-with-LFA.
Open Datasets	No	The paper describes generating its own datasets: 'The expert demonstration DE is obtained by sampling 5 trajectories from πE while the additional dataset DA is collected by sampling 1000 trajectories from the policy which samples uniformly random actions.' No access information for publicly available datasets is provided.
Dataset Splits	No	The paper describes the generation of expert and additional datasets but does not specify training, validation, or test splits from these datasets. It evaluates 'optimality gap' and 'average return' directly from the generated data.
Hardware Specification	No	The paper describes a simulated environment but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the simulations or experiments.
Software Dependencies	No	The paper mentions that PGAPI is implemented according to Algorithm 3 but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We conduct our proposed method PGAPI for 20 iterations and compare the average return of PGAPI with the performance of expert πE, BC method on DE , and BC method on the mixture of DE and DA.