reproducibilityindex.ai

Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Authors: Nicholay Topin, Stephanie Milani, Fei Fang, Manuela Veloso9923-9931

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show the beneﬁts of our approach by solving IBMDPs to produce decision tree policies for the base MDPs.
Researcher Affiliation	Academia	Nicholay Topin, Stephanie Milani, Fei Fang, Manuela Veloso Carnegie Mellon University Pittsburgh, PA 15213 {ntopin, smilani, feif, veloso}@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 Extract a Decision Tree Policy from an IBMDP policy π, beginning traversal from obs. ... procedure SUBTREE FROM POLICY(obs, π)
Open Source Code	No	The paper mentions that 'Further environment details and experiment parameters are in the Appendix (available at arxiv.org/abs/2102.13045)', but this does not explicitly state that the source code for the methodology is provided.
Open Datasets	Yes	Cart Pole (Barto, Sutton, and Anderson 1983) ... We use the Open AI Gym (Brockman et al. 2016) variant: ... Prereq World (Topin and Veloso 2019) ... Pothole World We introduce a new domain...
Dataset Splits	No	The paper describes experiments in reinforcement learning environments and mentions '50 trials' for evaluation, but it does not specify explicit training, validation, and test dataset splits in the conventional sense for static datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'DDQN, PPO, and MFEC' and 'Open AI Gym', but it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	No	The paper states 'Further environment details and experiment parameters are in the Appendix (available at arxiv.org/abs/2102.13045)', indicating that such details are not in the main text. The main text describes the modifications to algorithms but does not provide concrete hyperparameters or system-level training settings.