Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Authors: Nicholay Topin, Stephanie Milani, Fei Fang, Manuela Veloso9923-9931

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show the benefits of our approach by solving IBMDPs to produce decision tree policies for the base MDPs.
Researcher Affiliation Academia Nicholay Topin, Stephanie Milani, Fei Fang, Manuela Veloso Carnegie Mellon University Pittsburgh, PA 15213 {ntopin, smilani, feif, veloso}@cs.cmu.edu
Pseudocode Yes Algorithm 1 Extract a Decision Tree Policy from an IBMDP policy π, beginning traversal from obs. ... procedure SUBTREE FROM POLICY(obs, π)
Open Source Code No The paper mentions that 'Further environment details and experiment parameters are in the Appendix (available at arxiv.org/abs/2102.13045)', but this does not explicitly state that the source code for the methodology is provided.
Open Datasets Yes Cart Pole (Barto, Sutton, and Anderson 1983) ... We use the Open AI Gym (Brockman et al. 2016) variant: ... Prereq World (Topin and Veloso 2019) ... Pothole World We introduce a new domain...
Dataset Splits No The paper describes experiments in reinforcement learning environments and mentions '50 trials' for evaluation, but it does not specify explicit training, validation, and test dataset splits in the conventional sense for static datasets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'DDQN, PPO, and MFEC' and 'Open AI Gym', but it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup No The paper states 'Further environment details and experiment parameters are in the Appendix (available at arxiv.org/abs/2102.13045)', indicating that such details are not in the main text. The main text describes the modifications to algorithms but does not provide concrete hyperparameters or system-level training settings.