reproducibilityindex.ai

Reward Propagation Using Graph Convolutional Networks

Authors: Martin Klissarov, Doina Precup

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify empirically that our approach can achieve considerable improvements in both small and high-dimensional control problems. We ﬁrst evaluate our approach in tabular domains where we achieve similar performance compared to potential based reward shaping built on the forward-backward algorithm. Unlike hand-engineered potential functions, our method scales naturally to more complex environments; we illustrate this on navigation-based vision tasks from the Mini World environment [Chevalier-Boisvert, 2018], on a variety of games from the Atari 2600 benchmark [Bellemare et al., 2012] and on a set of continuous control environments based on Mu Jo Co [Todorov et al., 2012] , where our method fares signiﬁcantly better than actor-critic algorithms [Sutton et al., 1999a, Schulman et al., 2017] and additional baselines.
Researcher Affiliation	Collaboration	Martin Klissarov Mila, Mc Gill University martin.klissarov@mail.mcgill.ca Doina Precup Mila, Mc Gill University and Deep Mind dprecup@cs.mcgill.ca
Pseudocode	Yes	Algorithm 1: Reward shaping using GCNs
Open Source Code	No	The paper does not explicitly state that its own code for the described methodology is open-source or provide a link to it. It only references third-party open-source implementations used as baselines.
Open Datasets	Yes	Mini World environment [Chevalier-Boisvert, 2018], Atari 2600 benchmark [Bellemare et al., 2012], Mu Jo Co [Todorov et al., 2012], CIFAR-10 images [Krizhevsky et al.]
Dataset Splits	No	The paper mentions 'Validation accuracy on the Cora dataset' in Figure 2b, but it does not provide specific training/validation/test splits (percentages, counts, or references to predefined splits) for the main experimental environments (Mini World, Atari, MuJoCo).
Hardware Specification	Yes	We did these evaluations on a single V100 GPU, 8 CPUs and 40GB of RAM.
Software Dependencies	No	The paper mentions software like 'Pytorch' but does not provide specific version numbers for any libraries, frameworks, or solvers used in the experiments.
Experiment Setup	Yes	All details about hyperparameters and network architectures are provided in the Appendix A.2. An important hyperparameter in our approach is effectively α which trades-off between the reward shaped return and the default return. We also investigate the role of η, the hyperparameter trading-off between the two losses of the GCN, in Appendix A.4