Transfer Reinforcement Learning Using Output-Gated Working Memory

Authors: Arthur Williams, Joshua Phillips1324-1331

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, the model s utility was tested on a temporally extended, partially observable 5x5 2D grid-world maze task that required the agent to learn 3 tasks over the duration of the training period. The results indicate that the addition of output gating increases the initial learning performance of an agent in target tasks and decreases the learning time required to reach a fixed performance threshold.
Researcher Affiliation Academia Arthur S. Williams Center for Computational Science Middle Tennessee State University Murfreesboro, TN, USA asw3x@mtmail.mtsu.edu Joshua L. Phillips Department of Computer Science Middle Tennessee State University Murfreesboro, TN, USA Joshua.Phillips@mtsu.edu
Pseudocode No The paper provides mathematical formulas and descriptions of the model's operations, but no structured pseudocode or algorithm blocks.
Open Source Code Yes Also, the source code is available online at: https://github.com/arthurw125/ AAAI20 Transfer.
Open Datasets No The paper utilizes a custom 5x5 2D grid-world maze task described within the paper, but no link or citation to a publicly available dataset for this environment is provided.
Dataset Splits No The paper describes taking the mean of 100 training sample runs and measuring error, but does not specify explicit train/validation/test dataset splits. The grid-world environment is a simulation, not a static dataset with predefined splits.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, libraries, frameworks).
Experiment Setup Yes Table 1: Parameter Descriptions and Values lists hyperparameters such as n (Size of HRR vectors), ε (Probability of non-greedy action choice), γ (Discount factor), α (Learning rate), λ (Trace decay), and b (Network bias) with their corresponding values.