Safety Verification of Decision-Tree Policies in Continuous Time

Authors: Christian Schilling, Anna Lukina, Emir Demirović, Kim Larsen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach by verifying safety of several decision trees distilled to imitate neural-network policies for nonlinear systems. We implemented1 our method in a tool based on Julia Reach [Bogomolov et al., 2019] for the Taylor-model algorithm (TM_reachin Algorithm 2) [Benet et al., 2019] and Lazy Sets [Forets and Schilling, 2021] for the set computations. We demonstrate our approach on the quadrotor system (Figure 1(a)) and three classical control problems from the Gymnasium [Brockman et al., 2016]: cart/pole, acrobot, and mountain/car.
Researcher Affiliation Academia Christian Schilling Aalborg University Aalborg, Denmark christianms@cs.aau.dk Anna Lukina TU Delft Delft, The Netherlands a.lukina@tudelft.nl Emir Demirovi c TU Delft Delft, The Netherlands e.demirovic@tudelft.nl Kim Guldstrand Larsen Aalborg University Aalborg, Denmark kgl@cs.aau.dk
Pseudocode Yes Algorithm 1 shows our high-level reachability method, which can be used to solve the reach-avoid problem (Section 2.2). The algorithm is parametric in two procedures, post T and postf, which together compute an enclosure of the reachable states for one control cycle. (Algorithms 1, 2, 3 are present in the paper)
Open Source Code Yes 1The implementation and experiments are available at https://github.com/Veri XAI/ Safety-Verification-of-Decision-Tree-Policies-in-Continuous-Time.
Open Datasets Yes We demonstrate our approach on the quadrotor system (Figure 1(a)) and three classical control problems from the Gymnasium [Brockman et al., 2016]: cart/pole, acrobot, and mountain/car. For the quadrotor system we collected 500 samples of state-action pairs from a neural-network policy used by Ivanov et al. [2019] and learned a decision tree of depth 10 using behavioral cloning. For the three other systems (cart/pole, acrobot, mountain/car) we first used deep Q-learning to train a two-layer convolutional neural network [Mnih et al., 2013] and then adopted the Viper algorithm [Bastani et al., 2018] to imitation-learn decision trees for these systems.
Dataset Splits No The paper describes how decision tree policies were obtained (e.g., via distillation or imitation learning) but does not provide specific train/validation/test dataset splits used for training or evaluating these policies or for the verification process itself.
Hardware Specification Yes All experiments were conducted on a laptop with an i7 1.80 GHz CPU and 32 GB RAM.
Software Dependencies No The paper mentions 'Julia Reach', 'Taylor-model algorithm', and 'Lazy Sets' but does not specify version numbers for these software components or for Julia itself.
Experiment Setup Yes The control period is 0.2 time units and the number of control cycles is 30. (Quadrotor), The control period is 0.02 time units. (Cart-pole), The control period is 0.2 time units and the time horizon is 100 time units. (Acrobot). We train a decision-tree policy of depth 10 imitating the neural network from [Ivanov et al., 2019]. We obtain two decision-tree policies and compare their performance. The system applies a nonlinear transformation to the state before passing it to the policy.