Gradient-Aware Model-Based Policy Search
Authors: Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli3801-3808
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties. ... We now present an experimental evaluation of GAMPS, whose objective is two-fold: assessing the effect of our weighting scheme for model learning and comparing the performance in batch policy optimization of our algorithm against model-based and model-free policy search baselines. |
| Researcher Affiliation | Academia | Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano Piazza Leonardo da Vinci, 32, 20133, Milano, Italy pierluca.doro@mail.polimi.it, {albertomaria.metelli, andrea.tirinzoni, matteo.papini, marcello.restelli}@polimi.it |
| Pseudocode | Yes | Algorithm 1 Gradient-Aware Model-based Policy Search |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code. It only mentions that the full version of the paper is available on arXiv. |
| Open Datasets | Yes | We test its performance on a simulated Minigolf environment (Lazaric, Restelli, and Bonarini 2008; Tirinzoni, Salvini, and Restelli 2019) and the 3-link Swimmer robot control benchmark based on Mujoco (2012). |
| Dataset Splits | No | The paper describes collecting datasets of trajectories ('generate 1000 trajectories', 'collect a dataset of 50 and 100 trajectories') and evaluating performance on the environment. However, it does not specify explicit training, validation, or test dataset splits of these collected trajectories in the conventional supervised learning sense. |
| Hardware Specification | No | The paper does not provide any specific hardware specifications (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions environments like 'Mujoco' and concepts like 'Gaussian policies' and 'neural networks', but it does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We collect experience with a linear policy πb that is deterministic on the lower area and randomly initialized in the upper area, which is also used as initial policy for learning. ... We use Gaussian policies that are linear on six radial basis function features. ... The policy is linear in the state features and the forward model is a 2-layer neural networks with 32 hidden neurons and tanh activation. ... We collect a dataset of 50 and 100 trajectories for minigolf and swimmer respectively, using an explorative policy, and then run the algorithms for 30 and 20 iterations. |