Safety Verification of Decision-Tree Policies in Continuous Time
Authors: Christian Schilling, Anna Lukina, Emir Demirović, Kim Larsen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach by verifying safety of several decision trees distilled to imitate neural-network policies for nonlinear systems. We implemented1 our method in a tool based on Julia Reach [Bogomolov et al., 2019] for the Taylor-model algorithm (TM_reachin Algorithm 2) [Benet et al., 2019] and Lazy Sets [Forets and Schilling, 2021] for the set computations. We demonstrate our approach on the quadrotor system (Figure 1(a)) and three classical control problems from the Gymnasium [Brockman et al., 2016]: cart/pole, acrobot, and mountain/car. |
| Researcher Affiliation | Academia | Christian Schilling Aalborg University Aalborg, Denmark christianms@cs.aau.dk Anna Lukina TU Delft Delft, The Netherlands a.lukina@tudelft.nl Emir Demirovi c TU Delft Delft, The Netherlands e.demirovic@tudelft.nl Kim Guldstrand Larsen Aalborg University Aalborg, Denmark kgl@cs.aau.dk |
| Pseudocode | Yes | Algorithm 1 shows our high-level reachability method, which can be used to solve the reach-avoid problem (Section 2.2). The algorithm is parametric in two procedures, post T and postf, which together compute an enclosure of the reachable states for one control cycle. (Algorithms 1, 2, 3 are present in the paper) |
| Open Source Code | Yes | 1The implementation and experiments are available at https://github.com/Veri XAI/ Safety-Verification-of-Decision-Tree-Policies-in-Continuous-Time. |
| Open Datasets | Yes | We demonstrate our approach on the quadrotor system (Figure 1(a)) and three classical control problems from the Gymnasium [Brockman et al., 2016]: cart/pole, acrobot, and mountain/car. For the quadrotor system we collected 500 samples of state-action pairs from a neural-network policy used by Ivanov et al. [2019] and learned a decision tree of depth 10 using behavioral cloning. For the three other systems (cart/pole, acrobot, mountain/car) we first used deep Q-learning to train a two-layer convolutional neural network [Mnih et al., 2013] and then adopted the Viper algorithm [Bastani et al., 2018] to imitation-learn decision trees for these systems. |
| Dataset Splits | No | The paper describes how decision tree policies were obtained (e.g., via distillation or imitation learning) but does not provide specific train/validation/test dataset splits used for training or evaluating these policies or for the verification process itself. |
| Hardware Specification | Yes | All experiments were conducted on a laptop with an i7 1.80 GHz CPU and 32 GB RAM. |
| Software Dependencies | No | The paper mentions 'Julia Reach', 'Taylor-model algorithm', and 'Lazy Sets' but does not specify version numbers for these software components or for Julia itself. |
| Experiment Setup | Yes | The control period is 0.2 time units and the number of control cycles is 30. (Quadrotor), The control period is 0.02 time units. (Cart-pole), The control period is 0.2 time units and the time horizon is 100 time units. (Acrobot). We train a decision-tree policy of depth 10 imitating the neural network from [Ivanov et al., 2019]. We obtain two decision-tree policies and compare their performance. The system applies a nonlinear transformation to the state before passing it to the policy. |