reproducibilityindex.ai

Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation

Authors: Matteo Pirotta, Simone Parisi, Marcello Restelli

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The properties of the proposed approach are empirically evaluated on two interesting MOMDPs. In this section, results related to the numerical simulations of the PMGA algorithm, in continuous domains, are presented. Performances are compared against value based and gradient algorithms: Stochastic Dynamic Programming (SDP), Multi-Objective Fitted Q Iteration (MOFQI) (Castelletti, Pianosi, and Restelli 2013), Pareto Following Algorithm (PFA) and Radial Algorithm (RA) (Parisi et al. 2014).
Researcher Affiliation	Academia	Matteo Pirotta and Simone Parisi and Marcello Restelli Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133, Milan, Italy matteo.pirotta@polimi.it, simone.parisi@mail.polimi.it, marcello.restelli@polimi.it
Pseudocode	No	No structured pseudocode or algorithm blocks (e.g., a figure or section explicitly labeled 'Algorithm' or 'Pseudocode') were found in the paper.
Open Source Code	No	The paper references 'Multi-objective reinforcement learning with continuous pareto frontier approximation supplementary material. Co RR abs/1406.3497.' This is a link to supplementary material, but not an explicit statement that the source code for the methodology described in this paper is available.
Open Datasets	No	The paper discusses experiments on problem domains like the 'Linear-Quadratic Gaussian regulator (LQG)' and the 'water reservoir problem'. These are simulated environments, not publicly available datasets that require specific access information or citations to data repositories.
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits) because its experiments are based on simulated problem domains rather than fixed datasets.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1') needed to replicate the experiment.
Experiment Setup	Yes	In all the experiments the learning rate α was set by hand-tuning. The integral estimate was performed using a Monte Carlo algorithm fed with only 100 random points. For each value of t, 100 trajectory by 100 steps were used to estimate the gradient and Hessian of the policy performance. PMGA Parameters (#Episodes = 30, #steps = 30). We start the learning from an arbitrary parametrization with all the parameters ρi set to 20.