reproducibilityindex.ai

Composable Modular Reinforcement Learning

Authors: Christopher Simpkins, Charles Isbell4975-4982

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that GM-Sarsa/Q-decomposition degrades when modules are modiﬁed to have incomparable reward scales and that Arbi-Q is robust to such modiﬁcation. The learning curves depicted in Figure 2 show that GM-Sarsa bunny agent with incomparable reward scales for its modules converges to a lower score than with comparable rewards. As Figure 3 shows, Arbi-Q does not exhibit any performance degradation when the agent s modules have incomparable reward scales.
Researcher Affiliation	Academia	Christopher Simpkins, Charles Isbell College of Computing Georgia Institute of Technology 801 Atlantic Drive Atlanta, GA 30332-0280, USA chris.simpkins@gatech.edu, isbell@cc.gatech.edu
Pseudocode	Yes	Algorithm 1 Arbi-Q (sketch)
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper uses a simulated 'Bunny-Wolf World' derived from prior work but does not provide access information (link, DOI, citation with authors/year) for a public dataset in the typical sense of a static collection of data. It describes the environment used for training the agents.
Dataset Splits	No	The paper describes how performance is evaluated during training ('suspending learning every n/100 steps to evaluate performance'), but it does not specify training, validation, or test dataset splits in the context of a fixed dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions algorithms like 'Sarsa' but does not specify any software names with version numbers (e.g., Python libraries, frameworks, or specific solvers) used for implementation.
Experiment Setup	Yes	Each algorithm used a discount rate of 0.9 and ϵ-greedy action selection during training with ϵ linearly discounted from 0.4, as in Sprague and Ballard s experiments.