reproducibilityindex.ai

Apprenticeship Learning via Frank-Wolfe

Authors: Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour6720-6728

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we compare the CG and ASCG methods for AL in two AL domains: an autonomous driving simulation (Abbeel and Ng 2004; Syed and Schapire 2008), and a grid world domain. The results in each experiment are averaged over 10 runs of each algorithm (random seeds). The mean is presented in a solid line; around it, the colored area shows the mean plus/minus the standard deviation.
Researcher Affiliation	Industry	Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour Google Research, Tel Aviv
Pseudocode	Yes	Algorithm 1 The projection method (Abbeel and Ng 2004), Algorithm 2 The CG method (Frank and Wolfe 1956), Algorithm 3 Frank-Wolfe with away steps (ASCG)
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper describes custom simulation environments ("5 5 grid world domain", "three-lane highway"), not publicly available datasets with concrete access information like links or citations.
Dataset Splits	No	The paper mentions averaging results over multiple runs, but it does not specify any train/validation/test dataset splits or cross-validation setup for the data used in the simulations.
Hardware Specification	No	The paper does not specify any details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions algorithmic components like "Q learning" and "ϵ greedy exploration" with parameter values, but it does not list specific software packages or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We used NEstimation = 300, H = 50, NRL = 300 and run both CG and ASCG for Niter = 100 steps. (for Gridworld) and We used NEstimation = 1000, H = 40, NRL = 1000 and run both algorithms for Niter = 50 steps. (for Car simulator). Also includes an ϵ greedy exploration with ϵ = 0.05 and a learning rate of αt = 0.2/t0.75.