reproducibilityindex.ai

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Authors: Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, Tie-Yan Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we provide empirical results to answer the following questions: On policy evaluation settings, can MD3QN accurately model the joint distribution of multiple sources of reward? On policy optimization settings, can MD3QN learn a better policy compared to HRA and distributional RL algorithm on environments with multiple sources of reward?
Researcher Affiliation	Collaboration	Pushi Zhang Tsinghua University zpschang@gmail.com Xiaoyu Chen Tsinghua University chen-xy21@mails.tsinghua.edu.cn Li Zhao Microsoft Research Asia lizo@microsoft.com Wei Xiong The Hong Kong University of Science and Technology wxiongae@connect.ust.hk Tao Qin Microsoft Research Asia taoqin@microsoft.com Tie-Yan Liu Microsoft Research Asia tyliu@microsoft.com
Pseudocode	Yes	Algorithm 1 Gradient estimation of MMD2 loss by transition samples
Open Source Code	Yes	Our code for the experiments is available in the supplementary material.
Open Datasets	Yes	On Atari games from Arcade Learning Environment (Bellemare et al., 2013). We use six Atari games with multiple sources of rewards: Air Raid, Asteroids, Gopher, Ms Pacman, Up NDown, and Pong.
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits, percentages, or counts. It only refers to 'training curve results' on Atari games.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory.
Software Dependencies	No	The paper mentions that 'Our implementation of MD3QN is built upon the Dopamine framework (Castro et al., 2018)', but it does not provide specific version numbers for Dopamine or any other software dependencies needed for reproducibility.
Experiment Setup	Yes	The hyper-parameter settings used by MD3QN and the environment settings are detailed in Appendix A.2.