Interpretable Off-Policy Learning via Hyperbox Search

Authors: Daniel Tschernutter, Tobias Hatt, Stefan Feuerriegel

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using a simulation study, we demonstrate that our algorithm outperforms stateof-the-art methods from interpretable off-policy learning in terms of regret. Using real-word clinical data, we perform a user study with actual clinical experts, who rate our policies as highly interpretable.
Researcher Affiliation Academia 1ETH Zurich, Switzerland 2LMU, Germany.
Pseudocode Yes Algorithm 1 IOPL
Open Source Code Yes We provide a publicly available implementation of IOPL in Python. For solving the LP (linear program) relaxations and the pricing problem, we use Gurobi 9.0. We stop IOPL if it exceeds l = 50 branchand-bound iterations as given in Algorithm 1. We set a maximum time limit of 180 seconds for solving the pricing problem in our experiments. We emphasize that this time limit was never exceeded in our experiments, including the experiments with the real-world clinical data (see Section 4.5.2 for a discussion of the reasons).5Code available at https://github.com/ Daniel Tschernutter/IOPL
Open Datasets Yes We draw upon the AIDS Clinical Trial Group (ACTG) study 175 (Hammer et al., 1996).
Dataset Splits Yes For all baselines, we use 80% of the data for training and 20% for validation.
Hardware Specification Yes We run all of our experiments on a server with two 16-core Intel Xeon Gold 6242 processors each with 2.8GHz and 192GB of RAM.
Software Dependencies Yes We provide a publicly available implementation of IOPL in Python. For solving the LP (linear program) relaxations and the pricing problem, we use Gurobi 9.0.
Experiment Setup Yes The hyperparameters are given in Table 2. Table 2: Hyperparameter Grids (e.g., 'initial learning rate {10^-1, 10^-2, 10^-3, 10^-4}', 'batch size {128, full}', 'regularization parameter ρ {10^-2, 10^-3, 10^-4}')