reproducibilityindex.ai

Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation

Authors: Tian Xu, Zhilong Zhang, Ruishuo Chen, Yihao Sun, Yang Yu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies demonstrate that OPT-AIL outperforms previous state-of-the-art deep AIL methods in several challenging tasks.
Researcher Affiliation	Collaboration	1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3Polixir.ai
Pseudocode	Yes	Algorithm 1 Optimization-based Adversarial Imitation Learning
Open Source Code	Yes	1The code is available at https://github.com/LAMDA-RL/OPT-AIL.
Open Datasets	Yes	We conduct experiments on 8 tasks sourced from the feature-based DMControl benchmark [53]
Dataset Splits	No	No explicit train/validation/test dataset splits (e.g., percentages or counts) were found. The paper describes using 'expert trajectories' and 'environment interactions' for learning, which are dynamically generated in the DMControl benchmark.
Hardware Specification	Yes	The experiments are conducted on a machine with 64 CPU cores and 4 RTX4090 GPU cores.
Software Dependencies	No	The paper mentions using open-sourced framework of IQLearn and SAC [17] for policy update, and Adam as the optimizer, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	A comprehensive enumeration of the hyperparameters of OPT-AIL is provided in Table 2. Parameter Value: discount (γ) 0.99, gradient penalty coefficient (β) 1, 10, optimism regularization coefficient (λ) 10^-3, temperature (α) 10^-2, replay buffer size 5 * 10^5, batch size 256, optimizer Adam, Discriminator learning rate 3 * 10^-5, number of hidden layers 2, number of hidden units per layer 256, activation Re LU, Actor learning rate 3 * 10^-5, number of hidden layers 2, number of hidden units per layer 256, activation Re LU, Critic learning rate 3 * 10^-4, number of hidden layers 2, number of hidden units per layer 256, activation Re LU.