Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation
Authors: Tian Xu, Zhilong Zhang, Ruishuo Chen, Yihao Sun, Yang Yu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies demonstrate that OPT-AIL outperforms previous state-of-the-art deep AIL methods in several challenging tasks. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3Polixir.ai |
| Pseudocode | Yes | Algorithm 1 Optimization-based Adversarial Imitation Learning |
| Open Source Code | Yes | 1The code is available at https://github.com/LAMDA-RL/OPT-AIL. |
| Open Datasets | Yes | We conduct experiments on 8 tasks sourced from the feature-based DMControl benchmark [53] |
| Dataset Splits | No | No explicit train/validation/test dataset splits (e.g., percentages or counts) were found. The paper describes using 'expert trajectories' and 'environment interactions' for learning, which are dynamically generated in the DMControl benchmark. |
| Hardware Specification | Yes | The experiments are conducted on a machine with 64 CPU cores and 4 RTX4090 GPU cores. |
| Software Dependencies | No | The paper mentions using open-sourced framework of IQLearn and SAC [17] for policy update, and Adam as the optimizer, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | A comprehensive enumeration of the hyperparameters of OPT-AIL is provided in Table 2. Parameter Value: discount (γ) 0.99, gradient penalty coefficient (β) 1, 10, optimism regularization coefficient (λ) 10^-3, temperature (α) 10^-2, replay buffer size 5 * 10^5, batch size 256, optimizer Adam, Discriminator learning rate 3 * 10^-5, number of hidden layers 2, number of hidden units per layer 256, activation Re LU, Actor learning rate 3 * 10^-5, number of hidden layers 2, number of hidden units per layer 256, activation Re LU, Critic learning rate 3 * 10^-4, number of hidden layers 2, number of hidden units per layer 256, activation Re LU. |