RD$^2$: Reward Decomposition with Representation Decomposition
Authors: Zichuan Lin, Derek Yang, Li Zhao, Tao Qin, Guangwen Yang, Tie-Yan Liu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | RD2 is evaluated on a toy case, where we have the true reward structure, and chosen Atari environments where the reward structure exists but is unknown to the agent to demonstrate the effectiveness of RD2 against existing reward decomposition methods. |
| Researcher Affiliation | Collaboration | Zichuan Lin Tsinghua University lzcthu12@gmail.com Derek Yang UC San Diego dyang1206@gmail.com Li Zhao Microsoft Research lizo@microsoft.com Tao Qin Microsoft Research taoqin@microsoft.com Guangwen Yang Tsinghua University ygw@tsinghua.edu.cn Tieyan Liu Microsoft Research tyliu@microsoft.com |
| Pseudocode | Yes | We provide the pseudo code of our algorithm in Appendix 1. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | RD2 is evaluated on a toy case, where we have the true reward structure, and chosen Atari environments... mini-gridworld [Chevalier-Boisvert et al., 2018], configured to the Monster-Treasure environment discussed earlier as shown in Figure 1. ... We experiment with the Atari games that have a structure of multiple reward sources. |
| Dataset Splits | No | The paper mentions using mini-gridworld and Atari environments but does not specify exact training, validation, or test split percentages or sample counts for these datasets. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper refers to existing research frameworks and algorithms like 'deep Q-learning', 'Dopamine', 'Adam', and 'Rainbow', but it does not list specific software dependencies with version numbers used for its own implementation (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | No | The paper states that 'For a detailed version of combining RD2 with Q-learning, please refer to Appendix A,' suggesting setup details might be there. However, the main text does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or explicit training configuration settings. |