reproducibilityindex.ai

Transfer Reinforcement Learning Using Output-Gated Working Memory

Authors: Arthur Williams, Joshua Phillips1324-1331

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the model s utility was tested on a temporally extended, partially observable 5x5 2D grid-world maze task that required the agent to learn 3 tasks over the duration of the training period. The results indicate that the addition of output gating increases the initial learning performance of an agent in target tasks and decreases the learning time required to reach a ﬁxed performance threshold.
Researcher Affiliation	Academia	Arthur S. Williams Center for Computational Science Middle Tennessee State University Murfreesboro, TN, USA asw3x@mtmail.mtsu.edu Joshua L. Phillips Department of Computer Science Middle Tennessee State University Murfreesboro, TN, USA Joshua.Phillips@mtsu.edu
Pseudocode	No	The paper provides mathematical formulas and descriptions of the model's operations, but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	Also, the source code is available online at: https://github.com/arthurw125/ AAAI20 Transfer.
Open Datasets	No	The paper utilizes a custom 5x5 2D grid-world maze task described within the paper, but no link or citation to a publicly available dataset for this environment is provided.
Dataset Splits	No	The paper describes taking the mean of 100 training sample runs and measuring error, but does not specify explicit train/validation/test dataset splits. The grid-world environment is a simulation, not a static dataset with predefined splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, libraries, frameworks).
Experiment Setup	Yes	Table 1: Parameter Descriptions and Values lists hyperparameters such as n (Size of HRR vectors), ε (Probability of non-greedy action choice), γ (Discount factor), α (Learning rate), λ (Trace decay), and b (Network bias) with their corresponding values.