reproducibilityindex.ai

RD$^2$: Reward Decomposition with Representation Decomposition

Authors: Zichuan Lin, Derek Yang, Li Zhao, Tao Qin, Guangwen Yang, Tie-Yan Liu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	RD2 is evaluated on a toy case, where we have the true reward structure, and chosen Atari environments where the reward structure exists but is unknown to the agent to demonstrate the effectiveness of RD2 against existing reward decomposition methods.
Researcher Affiliation	Collaboration	Zichuan Lin Tsinghua University lzcthu12@gmail.com Derek Yang UC San Diego dyang1206@gmail.com Li Zhao Microsoft Research lizo@microsoft.com Tao Qin Microsoft Research taoqin@microsoft.com Guangwen Yang Tsinghua University ygw@tsinghua.edu.cn Tieyan Liu Microsoft Research tyliu@microsoft.com
Pseudocode	Yes	We provide the pseudo code of our algorithm in Appendix 1.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	RD2 is evaluated on a toy case, where we have the true reward structure, and chosen Atari environments... mini-gridworld [Chevalier-Boisvert et al., 2018], conﬁgured to the Monster-Treasure environment discussed earlier as shown in Figure 1. ... We experiment with the Atari games that have a structure of multiple reward sources.
Dataset Splits	No	The paper mentions using mini-gridworld and Atari environments but does not specify exact training, validation, or test split percentages or sample counts for these datasets.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper refers to existing research frameworks and algorithms like 'deep Q-learning', 'Dopamine', 'Adam', and 'Rainbow', but it does not list specific software dependencies with version numbers used for its own implementation (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	No	The paper states that 'For a detailed version of combining RD2 with Q-learning, please refer to Appendix A,' suggesting setup details might be there. However, the main text does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or explicit training configuration settings.