Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation

Authors: Matteo Pirotta, Simone Parisi, Marcello Restelli

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The properties of the proposed approach are empirically evaluated on two interesting MOMDPs. In this section, results related to the numerical simulations of the PMGA algorithm, in continuous domains, are presented. Performances are compared against value based and gradient algorithms: Stochastic Dynamic Programming (SDP), Multi-Objective Fitted Q Iteration (MOFQI) (Castelletti, Pianosi, and Restelli 2013), Pareto Following Algorithm (PFA) and Radial Algorithm (RA) (Parisi et al. 2014).
Researcher Affiliation Academia Matteo Pirotta and Simone Parisi and Marcello Restelli Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133, Milan, Italy matteo.pirotta@polimi.it, simone.parisi@mail.polimi.it, marcello.restelli@polimi.it
Pseudocode No No structured pseudocode or algorithm blocks (e.g., a figure or section explicitly labeled 'Algorithm' or 'Pseudocode') were found in the paper.
Open Source Code No The paper references 'Multi-objective reinforcement learning with continuous pareto frontier approximation supplementary material. Co RR abs/1406.3497.' This is a link to supplementary material, but not an explicit statement that the source code for the methodology described in this paper is available.
Open Datasets No The paper discusses experiments on problem domains like the 'Linear-Quadratic Gaussian regulator (LQG)' and the 'water reservoir problem'. These are simulated environments, not publicly available datasets that require specific access information or citations to data repositories.
Dataset Splits No The paper does not provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits) because its experiments are based on simulated problem domains rather than fixed datasets.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running the experiments were provided in the paper.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1') needed to replicate the experiment.
Experiment Setup Yes In all the experiments the learning rate α was set by hand-tuning. The integral estimate was performed using a Monte Carlo algorithm fed with only 100 random points. For each value of t, 100 trajectory by 100 steps were used to estimate the gradient and Hessian of the policy performance. PMGA Parameters (#Episodes = 30, #steps = 30). We start the learning from an arbitrary parametrization with all the parameters ρi set to 20.