Transfer Reinforcement Learning Using Output-Gated Working Memory
Authors: Arthur Williams, Joshua Phillips1324-1331
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the model s utility was tested on a temporally extended, partially observable 5x5 2D grid-world maze task that required the agent to learn 3 tasks over the duration of the training period. The results indicate that the addition of output gating increases the initial learning performance of an agent in target tasks and decreases the learning time required to reach a fixed performance threshold. |
| Researcher Affiliation | Academia | Arthur S. Williams Center for Computational Science Middle Tennessee State University Murfreesboro, TN, USA asw3x@mtmail.mtsu.edu Joshua L. Phillips Department of Computer Science Middle Tennessee State University Murfreesboro, TN, USA Joshua.Phillips@mtsu.edu |
| Pseudocode | No | The paper provides mathematical formulas and descriptions of the model's operations, but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Also, the source code is available online at: https://github.com/arthurw125/ AAAI20 Transfer. |
| Open Datasets | No | The paper utilizes a custom 5x5 2D grid-world maze task described within the paper, but no link or citation to a publicly available dataset for this environment is provided. |
| Dataset Splits | No | The paper describes taking the mean of 100 training sample runs and measuring error, but does not specify explicit train/validation/test dataset splits. The grid-world environment is a simulation, not a static dataset with predefined splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, libraries, frameworks). |
| Experiment Setup | Yes | Table 1: Parameter Descriptions and Values lists hyperparameters such as n (Size of HRR vectors), ε (Probability of non-greedy action choice), γ (Discount factor), α (Learning rate), λ (Trace decay), and b (Network bias) with their corresponding values. |