Composable Modular Reinforcement Learning
Authors: Christopher Simpkins, Charles Isbell4975-4982
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that GM-Sarsa/Q-decomposition degrades when modules are modified to have incomparable reward scales and that Arbi-Q is robust to such modification. The learning curves depicted in Figure 2 show that GM-Sarsa bunny agent with incomparable reward scales for its modules converges to a lower score than with comparable rewards. As Figure 3 shows, Arbi-Q does not exhibit any performance degradation when the agent s modules have incomparable reward scales. |
| Researcher Affiliation | Academia | Christopher Simpkins, Charles Isbell College of Computing Georgia Institute of Technology 801 Atlantic Drive Atlanta, GA 30332-0280, USA chris.simpkins@gatech.edu, isbell@cc.gatech.edu |
| Pseudocode | Yes | Algorithm 1 Arbi-Q (sketch) |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper uses a simulated 'Bunny-Wolf World' derived from prior work but does not provide access information (link, DOI, citation with authors/year) for a public dataset in the typical sense of a static collection of data. It describes the environment used for training the agents. |
| Dataset Splits | No | The paper describes how performance is evaluated during training ('suspending learning every n/100 steps to evaluate performance'), but it does not specify training, validation, or test dataset splits in the context of a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions algorithms like 'Sarsa' but does not specify any software names with version numbers (e.g., Python libraries, frameworks, or specific solvers) used for implementation. |
| Experiment Setup | Yes | Each algorithm used a discount rate of 0.9 and ϵ-greedy action selection during training with ϵ linearly discounted from 0.4, as in Sprague and Ballard s experiments. |