Gradient-Aware Model-Based Policy Search

Authors: Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli3801-3808

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties. ... We now present an experimental evaluation of GAMPS, whose objective is two-fold: assessing the effect of our weighting scheme for model learning and comparing the performance in batch policy optimization of our algorithm against model-based and model-free policy search baselines.
Researcher Affiliation Academia Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano Piazza Leonardo da Vinci, 32, 20133, Milano, Italy pierluca.doro@mail.polimi.it, {albertomaria.metelli, andrea.tirinzoni, matteo.papini, marcello.restelli}@polimi.it
Pseudocode Yes Algorithm 1 Gradient-Aware Model-based Policy Search
Open Source Code No The paper does not provide an explicit statement or link to its open-source code. It only mentions that the full version of the paper is available on arXiv.
Open Datasets Yes We test its performance on a simulated Minigolf environment (Lazaric, Restelli, and Bonarini 2008; Tirinzoni, Salvini, and Restelli 2019) and the 3-link Swimmer robot control benchmark based on Mujoco (2012).
Dataset Splits No The paper describes collecting datasets of trajectories ('generate 1000 trajectories', 'collect a dataset of 50 and 100 trajectories') and evaluating performance on the environment. However, it does not specify explicit training, validation, or test dataset splits of these collected trajectories in the conventional supervised learning sense.
Hardware Specification No The paper does not provide any specific hardware specifications (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions environments like 'Mujoco' and concepts like 'Gaussian policies' and 'neural networks', but it does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes We collect experience with a linear policy πb that is deterministic on the lower area and randomly initialized in the upper area, which is also used as initial policy for learning. ... We use Gaussian policies that are linear on six radial basis function features. ... The policy is linear in the state features and the forward model is a 2-layer neural networks with 32 hidden neurons and tanh activation. ... We collect a dataset of 50 and 100 trajectories for minigolf and swimmer respectively, using an explorative policy, and then run the algorithms for 30 and 20 iterations.