Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Authors: Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, Tie-Yan Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we provide empirical results to answer the following questions: On policy evaluation settings, can MD3QN accurately model the joint distribution of multiple sources of reward? On policy optimization settings, can MD3QN learn a better policy compared to HRA and distributional RL algorithm on environments with multiple sources of reward?
Researcher Affiliation Collaboration Pushi Zhang Tsinghua University zpschang@gmail.com Xiaoyu Chen Tsinghua University chen-xy21@mails.tsinghua.edu.cn Li Zhao Microsoft Research Asia lizo@microsoft.com Wei Xiong The Hong Kong University of Science and Technology wxiongae@connect.ust.hk Tao Qin Microsoft Research Asia taoqin@microsoft.com Tie-Yan Liu Microsoft Research Asia tyliu@microsoft.com
Pseudocode Yes Algorithm 1 Gradient estimation of MMD2 loss by transition samples
Open Source Code Yes Our code for the experiments is available in the supplementary material.
Open Datasets Yes On Atari games from Arcade Learning Environment (Bellemare et al., 2013). We use six Atari games with multiple sources of rewards: Air Raid, Asteroids, Gopher, Ms Pacman, Up NDown, and Pong.
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits, percentages, or counts. It only refers to 'training curve results' on Atari games.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory.
Software Dependencies No The paper mentions that 'Our implementation of MD3QN is built upon the Dopamine framework (Castro et al., 2018)', but it does not provide specific version numbers for Dopamine or any other software dependencies needed for reproducibility.
Experiment Setup Yes The hyper-parameter settings used by MD3QN and the environment settings are detailed in Appendix A.2.