reproducibilityindex.ai

A distributional view on multi-objective policy optimization

Authors: Abbas Abdolmaleki, Sandy Huang, Leonard Hasenclever, Michael Neunert, Francis Song, Martina Zambelli, Murilo Martins, Nicolas Heess, Raia Hadsell, Martin Riedmiller

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
Researcher Affiliation	Industry	1Deep Mind. Correspondence to: Abbas Abdolmaleki <aabdolmaleki@google.com>, Sandy H. Huang <shhuang@google.com>.
Pseudocode	Yes	Algorithm 1 MO-MPO: One policy improvement step
Open Source Code	Yes	Code for MO-MPO will be made available online.1
Open Datasets	No	No explicit public dataset links, DOIs, or repository names are provided. The paper mentions using "motion capture reference data" from "Hasenclever et al. (2020)" and "treasure values in Yang et al. (2019)" but does not provide direct access information for these data sources.
Dataset Splits	No	The paper does not specify training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	No specific GPU or CPU models, or other detailed hardware specifications for the computing resources used for experiments, are provided.
Software Dependencies	Yes	We use CVXOPT (Andersen et al., 2020) as our convex optimization solver.
Experiment Setup	Yes	We set ϵ = 0.01 for scalarized MPO. If we start with a uniform policy and run MPO with β = 0.001 until the policy converges... For MO-V-MPO, we set all ϵk = 0.01. Also, for each objective, we set ϵk = 0.001 and set all others to 0.01. ...for MO-MPO we set ϵtask = 0.1 and ϵforce = 0.05, and for scalarized MPO we try [wtask, wforce] = [0.95, 0.05] and [0.8, 0.2].