Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
Authors: Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, Tie-Yan Liu
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we provide empirical results to answer the following questions: On policy evaluation settings, can MD3QN accurately model the joint distribution of multiple sources of reward? On policy optimization settings, can MD3QN learn a better policy compared to HRA and distributional RL algorithm on environments with multiple sources of reward? |
| Researcher Affiliation | Collaboration | Pushi Zhang Tsinghua University EMAIL Xiaoyu Chen Tsinghua University EMAIL Li Zhao Microsoft Research Asia EMAIL Wei Xiong The Hong Kong University of Science and Technology EMAIL Tao Qin Microsoft Research Asia EMAIL Tie-Yan Liu Microsoft Research Asia EMAIL |
| Pseudocode | Yes | Algorithm 1 Gradient estimation of MMD2 loss by transition samples |
| Open Source Code | Yes | Our code for the experiments is available in the supplementary material. |
| Open Datasets | Yes | On Atari games from Arcade Learning Environment (Bellemare et al., 2013). We use six Atari games with multiple sources of rewards: Air Raid, Asteroids, Gopher, Ms Pacman, Up NDown, and Pong. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits, percentages, or counts. It only refers to 'training curve results' on Atari games. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory. |
| Software Dependencies | No | The paper mentions that 'Our implementation of MD3QN is built upon the Dopamine framework (Castro et al., 2018)', but it does not provide specific version numbers for Dopamine or any other software dependencies needed for reproducibility. |
| Experiment Setup | Yes | The hyper-parameter settings used by MD3QN and the environment settings are detailed in Appendix A.2. |