PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Authors: Toygun Basaklar, Suat Gumussoy, Umit Ogras

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section extensively evaluates the proposed PD-MORL technique using commonly used MORL benchmarks with discrete state-action spaces (Section 5.1) and complex MORL environments with continuous state-action spaces (Section 5.2).
Researcher Affiliation Collaboration Toygun Basaklar UW-Madison Madison, WI 53706 basaklar@wisc.edu Suat Gumussoy Siemens Corporate Technology Princeton, NJ 08540 suat.gumussoy@siemens.com Umit Y. Ogras UW-Madison Madison, WI 53706 uogras@wisc.edu
Pseudocode Yes Algorithm 1: Preference Driven MO-DDQN-HER
Open Source Code Yes The source code is attached with the rest of the supplementary material, providing a complete description of the multi-objective RL environments and instructions on reproducing our experiments.
Open Datasets Yes We first evaluate PD-MORL s performance on two commonly used discrete MORL benchmarks: Deep Sea Treasure (Hayes et al., 2022) and Fruit Tree Navigation (Yang et al., 2019).
Dataset Splits No The paper describes the benchmarks used (e.g., Deep Sea Treasure, Fruit Tree Navigation, MO-Walker2d-v2) but does not provide explicit details about how these datasets were split into training, validation, or test sets (e.g., specific percentages or sample counts) for their experiments.
Hardware Specification Yes We run all our experiments on a local server including Intel Xeon Gold 6242R.
Software Dependencies No The paper mentions using radial basis function interpolation and cites 'SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, 2020' but does not explicitly list specific software dependencies with their version numbers.
Experiment Setup Yes Table 4: Hyperparameters for MO-DDQN-HER