Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Teaching Temporal Logics to Neural Networks
Authors: Christopher Hahn, Frederik Schmitt, Jens U. Kreber, Markus Norman Rabe, Bernd Finkbeiner
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train a Transformer on the problem to directly predict a solution, i.e. a trace, to a given LTL formula. The training data is generated with classical solvers, which, however, only provide one of many possible solutions to each formula. We demonstrate that it is sufficient to train on those particular solutions to formulas, and that Transformers can predict solutions even to formulas from benchmarks from the literature on which the classical solver timed out. |
| Researcher Affiliation | Collaboration | Christopher Hahn CISPA Helmholtz Center for Information Security Saarbr ucken, 66123 Saarland, Germany EMAIL Frederik Schmitt CISPA Helmholtz Center for Information Security Saarbr ucken, 66123 Saarland, Germany EMAIL Jens U. Kreber Saarland University Saarbr ucken, 66123 Saarland, Germany EMAIL Markus N. Rabe Google Research Mountain View, CA, USA EMAIL Bernd Finkbeiner CISPA Helmholtz Center for Information Security Saarbr ucken, 66123 Saarland, Germany EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing their code or a link to a repository for the work described. |
| Open Datasets | No | The paper describes how they generated their datasets ('Our datasets consist of pairs of satisfiable LTL formulas and satisfying symbolic traces generated with tools and automata constructions from the spot framework (Duret-Lutz et al., 2016).'), but it does not provide concrete access (link, DOI, or explicit statement of public availability) to these generated datasets. |
| Dataset Splits | Yes | We split this set into an 80% training set, a 10% validation set, and a 10% test set. |
| Hardware Specification | Yes | We trained on a single GPU (NVIDIA P100 or V100). |
| Software Dependencies | No | The paper mentions tools like 'spot framework' and 'Glucose 4' but does not provide specific version numbers for these or other software dependencies, which are required for reproducibility. |
| Experiment Setup | Yes | In general, our best performing models used 8 layers, 8 attention heads, and an FC size of 1024. We used a batch size of 400 and trained for 450K steps (130 epochs) for our specification pattern dataset, and a batch size of 768 and trained for 50K steps (48 epochs) for our random formula dataset. A hyperparameter study can be found in Appendix C. |