Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
RD$^2$: Reward Decomposition with Representation Decomposition
Authors: Zichuan Lin, Derek Yang, Li Zhao, Tao Qin, Guangwen Yang, Tie-Yan Liu
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | RD2 is evaluated on a toy case, where we have the true reward structure, and chosen Atari environments where the reward structure exists but is unknown to the agent to demonstrate the effectiveness of RD2 against existing reward decomposition methods. |
| Researcher Affiliation | Collaboration | Zichuan Lin Tsinghua University EMAIL Derek Yang UC San Diego EMAIL Li Zhao Microsoft Research EMAIL Tao Qin Microsoft Research EMAIL Guangwen Yang Tsinghua University EMAIL Tieyan Liu Microsoft Research EMAIL |
| Pseudocode | Yes | We provide the pseudo code of our algorithm in Appendix 1. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | RD2 is evaluated on a toy case, where we have the true reward structure, and chosen Atari environments... mini-gridworld [Chevalier-Boisvert et al., 2018], con๏ฌgured to the Monster-Treasure environment discussed earlier as shown in Figure 1. ... We experiment with the Atari games that have a structure of multiple reward sources. |
| Dataset Splits | No | The paper mentions using mini-gridworld and Atari environments but does not specify exact training, validation, or test split percentages or sample counts for these datasets. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper refers to existing research frameworks and algorithms like 'deep Q-learning', 'Dopamine', 'Adam', and 'Rainbow', but it does not list specific software dependencies with version numbers used for its own implementation (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | No | The paper states that 'For a detailed version of combining RD2 with Q-learning, please refer to Appendix A,' suggesting setup details might be there. However, the main text does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or explicit training configuration settings. |