Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Composable Modular Reinforcement Learning
Authors: Christopher Simpkins, Charles Isbell4975-4982
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that GM-Sarsa/Q-decomposition degrades when modules are modified to have incomparable reward scales and that Arbi-Q is robust to such modification. The learning curves depicted in Figure 2 show that GM-Sarsa bunny agent with incomparable reward scales for its modules converges to a lower score than with comparable rewards. As Figure 3 shows, Arbi-Q does not exhibit any performance degradation when the agent s modules have incomparable reward scales. |
| Researcher Affiliation | Academia | Christopher Simpkins, Charles Isbell College of Computing Georgia Institute of Technology 801 Atlantic Drive Atlanta, GA 30332-0280, USA EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Arbi-Q (sketch) |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper uses a simulated 'Bunny-Wolf World' derived from prior work but does not provide access information (link, DOI, citation with authors/year) for a public dataset in the typical sense of a static collection of data. It describes the environment used for training the agents. |
| Dataset Splits | No | The paper describes how performance is evaluated during training ('suspending learning every n/100 steps to evaluate performance'), but it does not specify training, validation, or test dataset splits in the context of a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions algorithms like 'Sarsa' but does not specify any software names with version numbers (e.g., Python libraries, frameworks, or specific solvers) used for implementation. |
| Experiment Setup | Yes | Each algorithm used a discount rate of 0.9 and ϵ-greedy action selection during training with ϵ linearly discounted from 0.4, as in Sprague and Ballard s experiments. |