Apprenticeship Learning via Frank-Wolfe

Authors: Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour6720-6728

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare the CG and ASCG methods for AL in two AL domains: an autonomous driving simulation (Abbeel and Ng 2004; Syed and Schapire 2008), and a grid world domain. The results in each experiment are averaged over 10 runs of each algorithm (random seeds). The mean is presented in a solid line; around it, the colored area shows the mean plus/minus the standard deviation.
Researcher Affiliation Industry Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour Google Research, Tel Aviv
Pseudocode Yes Algorithm 1 The projection method (Abbeel and Ng 2004), Algorithm 2 The CG method (Frank and Wolfe 1956), Algorithm 3 Frank-Wolfe with away steps (ASCG)
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes custom simulation environments ("5 5 grid world domain", "three-lane highway"), not publicly available datasets with concrete access information like links or citations.
Dataset Splits No The paper mentions averaging results over multiple runs, but it does not specify any train/validation/test dataset splits or cross-validation setup for the data used in the simulations.
Hardware Specification No The paper does not specify any details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions algorithmic components like "Q learning" and "ϵ greedy exploration" with parameter values, but it does not list specific software packages or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We used NEstimation = 300, H = 50, NRL = 300 and run both CG and ASCG for Niter = 100 steps. (for Gridworld) and We used NEstimation = 1000, H = 40, NRL = 1000 and run both algorithms for Niter = 50 steps. (for Car simulator). Also includes an ϵ greedy exploration with ϵ = 0.05 and a learning rate of αt = 0.2/t0.75.