Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation

Authors: Tian Xu, Zhilong Zhang, Ruishuo Chen, Yihao Sun, Yang Yu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies demonstrate that OPT-AIL outperforms previous state-of-the-art deep AIL methods in several challenging tasks.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3Polixir.ai
Pseudocode Yes Algorithm 1 Optimization-based Adversarial Imitation Learning
Open Source Code Yes 1The code is available at https://github.com/LAMDA-RL/OPT-AIL.
Open Datasets Yes We conduct experiments on 8 tasks sourced from the feature-based DMControl benchmark [53]
Dataset Splits No No explicit train/validation/test dataset splits (e.g., percentages or counts) were found. The paper describes using 'expert trajectories' and 'environment interactions' for learning, which are dynamically generated in the DMControl benchmark.
Hardware Specification Yes The experiments are conducted on a machine with 64 CPU cores and 4 RTX4090 GPU cores.
Software Dependencies No The paper mentions using open-sourced framework of IQLearn and SAC [17] for policy update, and Adam as the optimizer, but does not provide specific version numbers for these software components.
Experiment Setup Yes A comprehensive enumeration of the hyperparameters of OPT-AIL is provided in Table 2. Parameter Value: discount (γ) 0.99, gradient penalty coefficient (β) 1, 10, optimism regularization coefficient (λ) 10^-3, temperature (α) 10^-2, replay buffer size 5 * 10^5, batch size 256, optimizer Adam, Discriminator learning rate 3 * 10^-5, number of hidden layers 2, number of hidden units per layer 256, activation Re LU, Actor learning rate 3 * 10^-5, number of hidden layers 2, number of hidden units per layer 256, activation Re LU, Critic learning rate 3 * 10^-4, number of hidden layers 2, number of hidden units per layer 256, activation Re LU.