Learning Interpretable Models in the Property Specification Language

Authors: Rajarshi Roy, Dana Fisman, Daniel Neider

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have implemented our algorithm and performed a comparative study between the proposed method and the existing LTL learning algorithm. Our results illustrate the effectiveness of the proposed approach to provide succinct humaninterpretable descriptions from examples.
Researcher Affiliation Academia 1Max Planck Institute for Software Systems, Kaiserslautern, Germany 2Ben-Gurion University, Be er Sheva, Israel
Pseudocode Yes Algorithm 1: SAT-based learning algorithm for PSL
Open Source Code Yes We have implemented a prototype of our learning algorithm, named Flie-PSL1 (Formal Language Inference Engine for PSL), which is written in Python and uses Z3 [de Moura and Bjørner, 2008] as SAT solver. 1The tool is available at https://github.com/ifm-mpi/Flie-PSL.
Open Datasets Yes The first benchmark suite is taken directly from [Neider and Gavran, 2018] and contains 1217 samples, which were generated from common LTL properties. The second benchmark suite simulates real-world PSL use-cases and contains 390 synthetic samples, which we have generated from PSL formulas that commonly appear in practice
Dataset Splits No The paper describes how samples are generated and partitioned into positive and negative examples for learning, but it does not specify any explicit training/validation/test splits for model evaluation or hyperparameter tuning in the traditional sense.
Hardware Specification Yes All experiments were conducted on a single core of an Intel Xeon E7-8857 V2 CPU (at 3.6 GHz) with a timeout of 1800 s.
Software Dependencies No The paper states it "is written in Python and uses Z3 [de Moura and Bjørner, 2008] as SAT solver." While Z3 is named and cited, specific version numbers for Python or Z3 are not provided.
Experiment Setup Yes All experiments were conducted on a single core of an Intel Xeon E7-8857 V2 CPU (at 3.6 GHz) with a timeout of 1800 s. ... we generate samples consisting of a number (ranging from 10 to 500) of ultimately periodic words uvω with |u| + |v| ≤ 15; thirdly, we partition the words in each sample into sets P and N depending on their satisfaction of ϕ. In total, the median size of the samples in the second benchmark suite is 100 words.