Apprenticeship Learning via Frank-Wolfe
Authors: Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour6720-6728
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare the CG and ASCG methods for AL in two AL domains: an autonomous driving simulation (Abbeel and Ng 2004; Syed and Schapire 2008), and a grid world domain. The results in each experiment are averaged over 10 runs of each algorithm (random seeds). The mean is presented in a solid line; around it, the colored area shows the mean plus/minus the standard deviation. |
| Researcher Affiliation | Industry | Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour Google Research, Tel Aviv |
| Pseudocode | Yes | Algorithm 1 The projection method (Abbeel and Ng 2004), Algorithm 2 The CG method (Frank and Wolfe 1956), Algorithm 3 Frank-Wolfe with away steps (ASCG) |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes custom simulation environments ("5 5 grid world domain", "three-lane highway"), not publicly available datasets with concrete access information like links or citations. |
| Dataset Splits | No | The paper mentions averaging results over multiple runs, but it does not specify any train/validation/test dataset splits or cross-validation setup for the data used in the simulations. |
| Hardware Specification | No | The paper does not specify any details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions algorithmic components like "Q learning" and "ϵ greedy exploration" with parameter values, but it does not list specific software packages or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We used NEstimation = 300, H = 50, NRL = 300 and run both CG and ASCG for Niter = 100 steps. (for Gridworld) and We used NEstimation = 1000, H = 40, NRL = 1000 and run both algorithms for Niter = 50 steps. (for Car simulator). Also includes an ϵ greedy exploration with ϵ = 0.05 and a learning rate of αt = 0.2/t0.75. |