reproducibilityindex.ai

Dynamic Weights in Multi-Objective Deep Reinforcement Learning

Authors: Axel Abels, Diederik Roijers, Tom Lenaerts, Ann Nowé, Denis Steckelmacher

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an extensive experimental evaluation and compare our methods to adapted algorithms from Deep Multi-Task/Multi-Objective RL and show that our proposed network in combination with DER dominates these adapted algorithms across weight change scenarios and problem domains.
Researcher Affiliation	Academia	1Machine Learning Group, Universit e Libre de Bruxelles, Brussels, Belgium 2Artiﬁcial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium 3Computational Intelligence, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.
Pseudocode	Yes	Please see Appendix 1.3 for a detailed description of the CN algorithm. Please see Appendix 1.5 for a detailed description of the DER algorithm. Please see Appendix 1.4 for a detailed description of the MN algorithm.
Open Source Code	Yes	The code can be found at https://github.com/ axelabels/Dyn MORL
Open Datasets	Yes	We test the performance of our algorithms on two different problems: the image version of Deep Sea Treasure (DST) proposed by Mossalam et al. (2016), and our newly proposed benchmark, the Minecart problem. The code can be found at https://github.com/ axelabels/Dyn MORL
Dataset Splits	No	The paper describes experimental scenarios but does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts, which is typical for continuous reinforcement learning environments.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper does not mention specific software dependencies with version numbers (e.g., Python version, library versions like TensorFlow or PyTorch).
Experiment Setup	Yes	First, we evaluate the performance for sparse and large weight changes; the current weight, w, is randomly sampled from a Dirichlet distribution (α = 1) every 50k steps for Minecart and 5k steps for DST. Second, we test on regular weight changes; w linearly moves to a random target, w , over 10 episodes, after which a new w is sampled. All algorithms are run with and without DER and with prioritized sampling (Schaul et al., 2015b).