PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm
Authors: Toygun Basaklar, Suat Gumussoy, Umit Ogras
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section extensively evaluates the proposed PD-MORL technique using commonly used MORL benchmarks with discrete state-action spaces (Section 5.1) and complex MORL environments with continuous state-action spaces (Section 5.2). |
| Researcher Affiliation | Collaboration | Toygun Basaklar UW-Madison Madison, WI 53706 basaklar@wisc.edu Suat Gumussoy Siemens Corporate Technology Princeton, NJ 08540 suat.gumussoy@siemens.com Umit Y. Ogras UW-Madison Madison, WI 53706 uogras@wisc.edu |
| Pseudocode | Yes | Algorithm 1: Preference Driven MO-DDQN-HER |
| Open Source Code | Yes | The source code is attached with the rest of the supplementary material, providing a complete description of the multi-objective RL environments and instructions on reproducing our experiments. |
| Open Datasets | Yes | We first evaluate PD-MORL s performance on two commonly used discrete MORL benchmarks: Deep Sea Treasure (Hayes et al., 2022) and Fruit Tree Navigation (Yang et al., 2019). |
| Dataset Splits | No | The paper describes the benchmarks used (e.g., Deep Sea Treasure, Fruit Tree Navigation, MO-Walker2d-v2) but does not provide explicit details about how these datasets were split into training, validation, or test sets (e.g., specific percentages or sample counts) for their experiments. |
| Hardware Specification | Yes | We run all our experiments on a local server including Intel Xeon Gold 6242R. |
| Software Dependencies | No | The paper mentions using radial basis function interpolation and cites 'SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, 2020' but does not explicitly list specific software dependencies with their version numbers. |
| Experiment Setup | Yes | Table 4: Hyperparameters for MO-DDQN-HER |