Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Teaching Temporal Logics to Neural Networks

Authors: Christopher Hahn, Frederik Schmitt, Jens U. Kreber, Markus Norman Rabe, Bernd Finkbeiner

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train a Transformer on the problem to directly predict a solution, i.e. a trace, to a given LTL formula. The training data is generated with classical solvers, which, however, only provide one of many possible solutions to each formula. We demonstrate that it is sufficient to train on those particular solutions to formulas, and that Transformers can predict solutions even to formulas from benchmarks from the literature on which the classical solver timed out.
Researcher Affiliation Collaboration Christopher Hahn CISPA Helmholtz Center for Information Security Saarbr ucken, 66123 Saarland, Germany EMAIL Frederik Schmitt CISPA Helmholtz Center for Information Security Saarbr ucken, 66123 Saarland, Germany EMAIL Jens U. Kreber Saarland University Saarbr ucken, 66123 Saarland, Germany EMAIL Markus N. Rabe Google Research Mountain View, CA, USA EMAIL Bernd Finkbeiner CISPA Helmholtz Center for Information Security Saarbr ucken, 66123 Saarland, Germany EMAIL
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not include an explicit statement about releasing their code or a link to a repository for the work described.
Open Datasets No The paper describes how they generated their datasets ('Our datasets consist of pairs of satisfiable LTL formulas and satisfying symbolic traces generated with tools and automata constructions from the spot framework (Duret-Lutz et al., 2016).'), but it does not provide concrete access (link, DOI, or explicit statement of public availability) to these generated datasets.
Dataset Splits Yes We split this set into an 80% training set, a 10% validation set, and a 10% test set.
Hardware Specification Yes We trained on a single GPU (NVIDIA P100 or V100).
Software Dependencies No The paper mentions tools like 'spot framework' and 'Glucose 4' but does not provide specific version numbers for these or other software dependencies, which are required for reproducibility.
Experiment Setup Yes In general, our best performing models used 8 layers, 8 attention heads, and an FC size of 1024. We used a batch size of 400 and trained for 450K steps (130 epochs) for our specification pattern dataset, and a batch size of 768 and trained for 50K steps (48 epochs) for our random formula dataset. A hyperparameter study can be found in Appendix C.