A distributional view on multi-objective policy optimization
Authors: Abbas Abdolmaleki, Sandy Huang, Leonard Hasenclever, Michael Neunert, Francis Song, Martina Zambelli, Murilo Martins, Nicolas Heess, Raia Hadsell, Martin Riedmiller
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions. |
| Researcher Affiliation | Industry | 1Deep Mind. Correspondence to: Abbas Abdolmaleki <aabdolmaleki@google.com>, Sandy H. Huang <shhuang@google.com>. |
| Pseudocode | Yes | Algorithm 1 MO-MPO: One policy improvement step |
| Open Source Code | Yes | Code for MO-MPO will be made available online.1 |
| Open Datasets | No | No explicit public dataset links, DOIs, or repository names are provided. The paper mentions using "motion capture reference data" from "Hasenclever et al. (2020)" and "treasure values in Yang et al. (2019)" but does not provide direct access information for these data sources. |
| Dataset Splits | No | The paper does not specify training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | No specific GPU or CPU models, or other detailed hardware specifications for the computing resources used for experiments, are provided. |
| Software Dependencies | Yes | We use CVXOPT (Andersen et al., 2020) as our convex optimization solver. |
| Experiment Setup | Yes | We set ϵ = 0.01 for scalarized MPO. If we start with a uniform policy and run MPO with β = 0.001 until the policy converges... For MO-V-MPO, we set all ϵk = 0.01. Also, for each objective, we set ϵk = 0.001 and set all others to 0.01. ...for MO-MPO we set ϵtask = 0.1 and ϵforce = 0.05, and for scalarized MPO we try [wtask, wforce] = [0.95, 0.05] and [0.8, 0.2]. |