On Computation and Generalization of Generative Adversarial Imitation Learning

Authors: Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li, Zhaoran Wang, Tuo Zhao

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are provided to support our analysis.
Researcher Affiliation Academia Georgia Tech, Xian Jiaotong University, Princeton University, Northwestern University
Pseudocode No The paper describes algorithms using equations and textual explanations (e.g., 'we apply the alternating mini-batch stochastic gradient algorithm to (2)'), but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statements about releasing open-source code or links to a code repository.
Open Datasets No The paper mentions using standard reinforcement learning tasks like 'Acrobot', 'Mountain Car', and 'Hopper' and generating 'demonstration data' from an 'expert policy' trained with PPO. While these environments are well-known, the paper does not provide a specific link, DOI, or formal citation for the *demonstration data* generated or for the environments themselves in a way that provides concrete access for reproducibility of the exact datasets used.
Dataset Splits No The paper states 'The demonstration data for every task contains 500 trajectories' and 'When training GAIL, we randomly select a mini-batch of trajectories', but it does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not mention any specific hardware specifications (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions using the 'proximal policy optimization (PPO) algorithm' and 'neural networks' but does not specify version numbers for any software dependencies like Python, PyTorch, TensorFlow, or specific libraries.
Experiment Setup Yes For policy, we use a fully connected neural network with two hidden layers of 128 neurons in each layer and tanh activation. For reward, we use a fully connected Re LU neural network with two hidden layers of 1024 and 512 neurons, respectively. To implement the kernel reward, we fix the first two layers of the neural network after random initialization and only update the third layer... We choose κ = 1 and µ = 0.3. ... We tune step size parameters for updating the policy and reward, and summarize the numerical results of the step sizes attaining the maximal average episode reward in Figure 1.