Generation of Policy-Level Explanations for Reinforcement Learning

Authors: Nicholay Topin, Manuela Veloso2514-2521

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate APG Gen on a novel domain with scalable state space and controllable stochasticity. We describe this domain, Prereq World, in Section 5.1. The results are presented in Figure 6. Note that the x-axis is in log-scale. The explanation size grows sub-linearly in m while the state-space size grows exponentially in m.
Researcher Affiliation Academia Nicholay Topin, Manuela Veloso Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 {ntopin, veloso}@cs.cmu.edu
Pseudocode Yes The pseudocode for our method is given in Algorithm 1.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets No The paper introduces a novel domain called 'Prereq World' and describes its generation, but it does not state that this dataset is publicly available or provide any access details (link, DOI, specific citation with authors/year, or repository).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for the 'Prereq World' domain, nor does it specify a cross-validation setup.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, processors, memory) used to run the experiments.
Software Dependencies No The paper mentions using 'value iteration' but does not provide specific ancillary software names with version numbers (e.g., library names, framework versions, or solver versions).
Experiment Setup Yes APG Gen Stopping Criterion (ϵ) In the case of binary features, FIRM corresponds to the expected change should the feature be changed from 0 to 1...We use this as a guideline for setting ϵ: we set ϵ to be the minimum difference in actionvalue between the best action and second-best action. For the Prereq World domain, this is ϵ = 1. Trials For each plotted data-point, we generate 100 different Prereq World instances. We evaluate each instance 1,000 times...