Composable Modular Reinforcement Learning

Authors: Christopher Simpkins, Charles Isbell4975-4982

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that GM-Sarsa/Q-decomposition degrades when modules are modified to have incomparable reward scales and that Arbi-Q is robust to such modification. The learning curves depicted in Figure 2 show that GM-Sarsa bunny agent with incomparable reward scales for its modules converges to a lower score than with comparable rewards. As Figure 3 shows, Arbi-Q does not exhibit any performance degradation when the agent s modules have incomparable reward scales.
Researcher Affiliation Academia Christopher Simpkins, Charles Isbell College of Computing Georgia Institute of Technology 801 Atlantic Drive Atlanta, GA 30332-0280, USA chris.simpkins@gatech.edu, isbell@cc.gatech.edu
Pseudocode Yes Algorithm 1 Arbi-Q (sketch)
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper uses a simulated 'Bunny-Wolf World' derived from prior work but does not provide access information (link, DOI, citation with authors/year) for a public dataset in the typical sense of a static collection of data. It describes the environment used for training the agents.
Dataset Splits No The paper describes how performance is evaluated during training ('suspending learning every n/100 steps to evaluate performance'), but it does not specify training, validation, or test dataset splits in the context of a fixed dataset.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions algorithms like 'Sarsa' but does not specify any software names with version numbers (e.g., Python libraries, frameworks, or specific solvers) used for implementation.
Experiment Setup Yes Each algorithm used a discount rate of 0.9 and ϵ-greedy action selection during training with ϵ linearly discounted from 0.4, as in Sprague and Ballard s experiments.