reproducibilityindex.ai

Estimating the Maximum Expected Value in Continuous Reinforcement Learning Problems

Authors: Carlo D'Eramo, Alessandro Nuara, Matteo Pirotta, Marcello Restelli

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of the proposed approach we perform empirical comparisons with related approaches.In this section we evaluate the performance of ME, DE and WE on three sequential decision-making problems: one Multi-Armed Bandit (MAB) problem and an MDP with both ﬁnite and continuous actions.
Researcher Affiliation	Academia	Carlo D Eramo, Alessandro Nuara, Matteo Pirotta, Marcello Restelli Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano Piazza Leonardo da Vinci, 32, 20133, Milano, Italy carlo.deramo@polimi.it, alessandro.nuara@mail.polimi.it, matteo.pirotta@polimi.it, marcello.restelli@polimi.it
Pseudocode	Yes	Algorithm 1 Double FQI, Algorithm 2 Weighted FQI (ﬁnite actions), Algorithm 3 Weighted FQI (continuous actions)
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	No	The paper describes generating samples for the Pricing Problem and collecting training sets using a random policy for the Swing-up Pendulum, but does not provide access information or citations for a publicly available dataset.
Dataset Splits	No	The paper mentions collecting training sets and evaluating performance on different initial conditions, but does not provide specific train/validation/test dataset splits with percentages, counts, or references to predefined splits.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using Gaussian Process regression but does not specify any software libraries or dependencies with version numbers.
Experiment Setup	Yes	Results are averaged on 50 runs in order to show conﬁdence intervals at 95%.The GP uses a squared exponential kernel with independent length scale for each input dimension (ARD SE). The hyperparameters are ﬁtted on the samples and the input values are normalized between [ 1, 1]. ... The FQI horizon is 10 iterations.