Self-Supervised Attention-Aware Reinforcement Learning
Authors: Haiping Wu, Khimya Khetarpal, Doina Precup10311-10319
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that the self-supervised attention-aware deep RL methods outperform the baselines in the context of both the rate of convergence and performance. Furthermore, the proposed self-supervised attention is not tied with specific policies, nor restricted to a specific scene. We posit that the proposed approach is a general self-supervised attention module for multi-task learning and transfer learning, and empirically validate the generalization ability of the proposed method. |
| Researcher Affiliation | Collaboration | Haiping Wu,1, 2 Khimya Khetarpal,1, 2 Doina Precup 1, 2, 3 1 Mc Gill University 2 Mila 3 Google Deep Mind, Montreal. {haiping.wu2, khimya.khetarpal}@mail.mcgill.ca, dprecup@cs.mcgill.ca |
| Pseudocode | Yes | The pseudocode is provided in the appendix. |
| Open Source Code | Yes | The source code is available here. 1https://github.com/happywu/Self-Sup-Attention-RL |
| Open Datasets | Yes | We evaluate our method on Atari ALE (Bellemare et al. 2013; Brockman et al. 2016) games. |
| Dataset Splits | No | The paper mentions 'The implementation details including experiment setups, network architectures and hyperparameters are provided in the appendix.' and shows results averaged over '5 random seeds' but does not provide specific percentages or counts for training, validation, or test splits in the main text. |
| Hardware Specification | No | The paper states 'The implementation details including experiment setups, network architectures and hyperparameters are provided in the appendix.' but does not include any specific hardware details like GPU or CPU models in the main body. |
| Software Dependencies | No | While the paper provides a link to the source code, it does not explicitly list specific software dependencies with their version numbers within the text. |
| Experiment Setup | No | The paper states 'The implementation details including experiment setups, network architectures and hyperparameters are provided in the appendix.' but does not provide specific hyperparameters or system-level training settings in the main text. |