Adversarial Imitation Learning via Boosting

Authors: Jonathan Daniel Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate our algorithm on both controller state-based and pixel-based environments from the Deep Mind Control Suite. AILBoost outperforms DAC on both types of environments, demonstrating the benefit of properly weighting replay buffer data for off-policy training.
Researcher Affiliation Academia Jonathan D. Chang Department of Computer Science Cornell University jdc396@cornell.edu Dhruv Sreenivas Department of Computer Science Cornell University ds844@cornell.edu Yingbing Huang Department of Electrical and Computer Engineering University of Illinois Urbana-Champaign yh21@illinois.edu Kianté Brantley Department of Computer Science Cornell University kdb82@cornell.edu Wen Sun Department of Computer Science Cornell University ws455@cornell.edu
Pseudocode Yes Algorithm 1 AILBOOST (Adversarial Imitation Learning via Boosting)
Open Source Code No The paper does not contain an unambiguous statement about releasing source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets Yes We evaluate AILBoost on the Deep Mind Control Suite (Tassa et al., 2018): Walker Walk, Cheetah Run, Ball in Cup Catch, Quadruped Walk, and Humanoid Stand. For each game, we train an expert RL agent using the environment s reward and collect 10 demonstrations which we use as the expert dataset throughout our experiments.
Dataset Splits No The paper discusses training and evaluation but does not explicitly detail a separate validation dataset split with specific percentages, counts, or methodologies.
Hardware Specification No The paper mentions using specific RL algorithms (SAC, Dr Q-v2) for training but does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing instances used for running its experiments.
Software Dependencies No The paper mentions using SAC and Dr Q-v2 (referencing 'pytorch_sac' in the bibliography) but does not provide specific version numbers for software components like Python, PyTorch, or other libraries that would be necessary for a reproducible setup.
Experiment Setup Yes Table 4: Hyperparameters used for DAC and AILBoost. All of DAC s hyperparameters are shared by AILBoost except for the parameters colored in blue. In particular, the update frequency of the disciminrator vs the policy is one of the core differences between DAC and AILBoost.