Error Bounds of Imitating Policies and Environments

Authors: Tian Xu, Ziniu Li, Yang Yu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate imitation learning methods on three Mu Jo Co benchmark tasks in Open AI Gym [10], where the agent aims to mimic locomotion skills. We consider the following approaches: BC [37], DAgger [40], GAIL [23], maximum entropy IRL algorithm AIRL [17] and apprenticeship learning algorithms FEM [1] and GTAL [47]. In particular, FEM and GTAL are based on the improved versions proposed in [24]. Besides GAIL, we also involve WGAIL (see Appendix D) in the comparisons. We run the state-of-the-art algorithm SAC [22] to obtain expert policies. All experiments run with 3 random seeds. Experiment details are given in Appendix E.1.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China 3Polixir Technologies, Nanjing 210038, China
Pseudocode Yes This procedure is summarized in Algorithm 1 in Appendix C.3.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate imitation learning methods on three Mu Jo Co benchmark tasks in Open AI Gym [10] and reference [10] which is 'Open AI Gym. arXiv, 1606.01540, 2016.'
Dataset Splits No The paper mentions collecting '10 expert trajectories' and using '3 expert trajectories for training' for policy imitation. However, it does not specify explicit training, validation, and test dataset splits (e.g., as percentages or sample counts) needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'Open AI Gym' and 'Mu Jo Co' benchmark tasks, and running 'SAC [22]', but does not specify software names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or MuJoCo simulator version) needed for replication.
Experiment Setup Yes We follow the same network architecture and hyperparameters in [23] for GAIL, except that we use the state as input for the discriminator. For the discriminator and policy, we use 2-layer fully-connected neural networks with 64 hidden units and ReLu activation functions, except for the last layer of discriminator uses sigmoid activation. We use Adam optimizer with a learning rate of 3e-4. For SAC, we use default parameters. For DAgger, we set the number of iterations to 50, batch size of 512, and learning rate of 3e-4.