Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods
Authors: Nicholay Topin, Stephanie Milani, Fei Fang, Manuela Veloso9923-9931
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show the benefits of our approach by solving IBMDPs to produce decision tree policies for the base MDPs. |
| Researcher Affiliation | Academia | Nicholay Topin, Stephanie Milani, Fei Fang, Manuela Veloso Carnegie Mellon University Pittsburgh, PA 15213 {ntopin, smilani, feif, veloso}@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Extract a Decision Tree Policy from an IBMDP policy π, beginning traversal from obs. ... procedure SUBTREE FROM POLICY(obs, π) |
| Open Source Code | No | The paper mentions that 'Further environment details and experiment parameters are in the Appendix (available at arxiv.org/abs/2102.13045)', but this does not explicitly state that the source code for the methodology is provided. |
| Open Datasets | Yes | Cart Pole (Barto, Sutton, and Anderson 1983) ... We use the Open AI Gym (Brockman et al. 2016) variant: ... Prereq World (Topin and Veloso 2019) ... Pothole World We introduce a new domain... |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments and mentions '50 trials' for evaluation, but it does not specify explicit training, validation, and test dataset splits in the conventional sense for static datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'DDQN, PPO, and MFEC' and 'Open AI Gym', but it does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | No | The paper states 'Further environment details and experiment parameters are in the Appendix (available at arxiv.org/abs/2102.13045)', indicating that such details are not in the main text. The main text describes the modifications to algorithms but does not provide concrete hyperparameters or system-level training settings. |