Optimal Decision Tree Policies for Markov Decision Processes

Authors: Daniƫl Vos, Sicco Verwer

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments comparing the performance of OMDTs with VIPER and dtcontrol. ... All of our experiments ran on a Linux machine with 16 Intel Xeon CPU cores and 72 GB of RAM total and used Gurobi 10.0.0 with default parameters. Each method ran on a single CPU core.
Researcher Affiliation Academia Dani el Vos , Sicco Verwer Delft University of Technology {d.a.vos, s.e.verwer}@tudelft.nl
Pseudocode No The paper describes the OMDT formulation using mathematical equations (1-8) and natural language, but it does not contain a structured pseudocode or algorithm block.
Open Source Code Yes The full code for OMDT and our experiments can be found on Git Hub3. (Footnote 3: https://github.com/tudelft-cda-lab/OMDT)
Open Datasets Yes For comparison we implemented 13 environments based on well-known MDPs from the literature, the sizes of these MDPs are given in Table 2.
Dataset Splits No The paper does not provide explicit training/test/validation dataset splits. Reinforcement learning often involves policy learning within an environment rather than static data splits.
Hardware Specification Yes All of our experiments ran on a Linux machine with 16 Intel Xeon CPU cores and 72 GB of RAM total and used Gurobi 10.0.0 with default parameters. Each method ran on a single CPU core.
Software Dependencies Yes All of our experiments ran on a Linux machine with 16 Intel Xeon CPU cores and 72 GB of RAM total and used Gurobi 10.0.0 with default parameters.
Experiment Setup Yes We consider an OMDT optimal when the relative gap between its objective and bound is proven to be less than 0.01%. We solved OMDTs for a depth of 3 for a maximum of 2 hours and display the results in Table 2. ... All runs were limited to 2 hours.