reproducibilityindex.ai

Gradient-Aware Model-Based Policy Search

Authors: Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli3801-3808

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties. ... We now present an experimental evaluation of GAMPS, whose objective is two-fold: assessing the effect of our weighting scheme for model learning and comparing the performance in batch policy optimization of our algorithm against model-based and model-free policy search baselines.
Researcher Affiliation	Academia	Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano Piazza Leonardo da Vinci, 32, 20133, Milano, Italy pierluca.doro@mail.polimi.it, {albertomaria.metelli, andrea.tirinzoni, matteo.papini, marcello.restelli}@polimi.it
Pseudocode	Yes	Algorithm 1 Gradient-Aware Model-based Policy Search
Open Source Code	No	The paper does not provide an explicit statement or link to its open-source code. It only mentions that the full version of the paper is available on arXiv.
Open Datasets	Yes	We test its performance on a simulated Minigolf environment (Lazaric, Restelli, and Bonarini 2008; Tirinzoni, Salvini, and Restelli 2019) and the 3-link Swimmer robot control benchmark based on Mujoco (2012).
Dataset Splits	No	The paper describes collecting datasets of trajectories ('generate 1000 trajectories', 'collect a dataset of 50 and 100 trajectories') and evaluating performance on the environment. However, it does not specify explicit training, validation, or test dataset splits of these collected trajectories in the conventional supervised learning sense.
Hardware Specification	No	The paper does not provide any specific hardware specifications (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions environments like 'Mujoco' and concepts like 'Gaussian policies' and 'neural networks', but it does not specify version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We collect experience with a linear policy πb that is deterministic on the lower area and randomly initialized in the upper area, which is also used as initial policy for learning. ... We use Gaussian policies that are linear on six radial basis function features. ... The policy is linear in the state features and the forward model is a 2-layer neural networks with 32 hidden neurons and tanh activation. ... We collect a dataset of 50 and 100 trajectories for minigolf and swimmer respectively, using an explorative policy, and then run the algorithms for 30 and 20 iterations.