Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Authors: Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, Tie-Yan Liu

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we provide empirical results to answer the following questions: On policy evaluation settings, can MD3QN accurately model the joint distribution of multiple sources of reward? On policy optimization settings, can MD3QN learn a better policy compared to HRA and distributional RL algorithm on environments with multiple sources of reward?
Researcher Affiliation Collaboration Pushi Zhang Tsinghua University EMAIL Xiaoyu Chen Tsinghua University EMAIL Li Zhao Microsoft Research Asia EMAIL Wei Xiong The Hong Kong University of Science and Technology EMAIL Tao Qin Microsoft Research Asia EMAIL Tie-Yan Liu Microsoft Research Asia EMAIL
Pseudocode Yes Algorithm 1 Gradient estimation of MMD2 loss by transition samples
Open Source Code Yes Our code for the experiments is available in the supplementary material.
Open Datasets Yes On Atari games from Arcade Learning Environment (Bellemare et al., 2013). We use six Atari games with multiple sources of rewards: Air Raid, Asteroids, Gopher, Ms Pacman, Up NDown, and Pong.
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits, percentages, or counts. It only refers to 'training curve results' on Atari games.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory.
Software Dependencies No The paper mentions that 'Our implementation of MD3QN is built upon the Dopamine framework (Castro et al., 2018)', but it does not provide specific version numbers for Dopamine or any other software dependencies needed for reproducibility.
Experiment Setup Yes The hyper-parameter settings used by MD3QN and the environment settings are detailed in Appendix A.2.