Reward Propagation Using Graph Convolutional Networks
Authors: Martin Klissarov, Doina Precup
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify empirically that our approach can achieve considerable improvements in both small and high-dimensional control problems. We first evaluate our approach in tabular domains where we achieve similar performance compared to potential based reward shaping built on the forward-backward algorithm. Unlike hand-engineered potential functions, our method scales naturally to more complex environments; we illustrate this on navigation-based vision tasks from the Mini World environment [Chevalier-Boisvert, 2018], on a variety of games from the Atari 2600 benchmark [Bellemare et al., 2012] and on a set of continuous control environments based on Mu Jo Co [Todorov et al., 2012] , where our method fares significantly better than actor-critic algorithms [Sutton et al., 1999a, Schulman et al., 2017] and additional baselines. |
| Researcher Affiliation | Collaboration | Martin Klissarov Mila, Mc Gill University martin.klissarov@mail.mcgill.ca Doina Precup Mila, Mc Gill University and Deep Mind dprecup@cs.mcgill.ca |
| Pseudocode | Yes | Algorithm 1: Reward shaping using GCNs |
| Open Source Code | No | The paper does not explicitly state that its own code for the described methodology is open-source or provide a link to it. It only references third-party open-source implementations used as baselines. |
| Open Datasets | Yes | Mini World environment [Chevalier-Boisvert, 2018], Atari 2600 benchmark [Bellemare et al., 2012], Mu Jo Co [Todorov et al., 2012], CIFAR-10 images [Krizhevsky et al.] |
| Dataset Splits | No | The paper mentions 'Validation accuracy on the Cora dataset' in Figure 2b, but it does not provide specific training/validation/test splits (percentages, counts, or references to predefined splits) for the main experimental environments (Mini World, Atari, MuJoCo). |
| Hardware Specification | Yes | We did these evaluations on a single V100 GPU, 8 CPUs and 40GB of RAM. |
| Software Dependencies | No | The paper mentions software like 'Pytorch' but does not provide specific version numbers for any libraries, frameworks, or solvers used in the experiments. |
| Experiment Setup | Yes | All details about hyperparameters and network architectures are provided in the Appendix A.2. An important hyperparameter in our approach is effectively α which trades-off between the reward shaped return and the default return. We also investigate the role of η, the hyperparameter trading-off between the two losses of the GCN, in Appendix A.4 |