Dynamic Weights in Multi-Objective Deep Reinforcement Learning

Authors: Axel Abels, Diederik Roijers, Tom Lenaerts, Ann Nowé, Denis Steckelmacher

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform an extensive experimental evaluation and compare our methods to adapted algorithms from Deep Multi-Task/Multi-Objective RL and show that our proposed network in combination with DER dominates these adapted algorithms across weight change scenarios and problem domains.
Researcher Affiliation Academia 1Machine Learning Group, Universit e Libre de Bruxelles, Brussels, Belgium 2Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium 3Computational Intelligence, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.
Pseudocode Yes Please see Appendix 1.3 for a detailed description of the CN algorithm. Please see Appendix 1.5 for a detailed description of the DER algorithm. Please see Appendix 1.4 for a detailed description of the MN algorithm.
Open Source Code Yes The code can be found at https://github.com/ axelabels/Dyn MORL
Open Datasets Yes We test the performance of our algorithms on two different problems: the image version of Deep Sea Treasure (DST) proposed by Mossalam et al. (2016), and our newly proposed benchmark, the Minecart problem. The code can be found at https://github.com/ axelabels/Dyn MORL
Dataset Splits No The paper describes experimental scenarios but does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts, which is typical for continuous reinforcement learning environments.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not mention specific software dependencies with version numbers (e.g., Python version, library versions like TensorFlow or PyTorch).
Experiment Setup Yes First, we evaluate the performance for sparse and large weight changes; the current weight, w, is randomly sampled from a Dirichlet distribution (α = 1) every 50k steps for Minecart and 5k steps for DST. Second, we test on regular weight changes; w linearly moves to a random target, w , over 10 episodes, after which a new w is sampled. All algorithms are run with and without DER and with prioritized sampling (Schaul et al., 2015b).