Faster Smarter Proof by Induction in Isabelle/HOL

Authors: Yutaka Nagashima

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation based on 1,095 inductive problems from 22 source files shows that sem ind improves the accuracy of recommendation from 20.1% to 38.2% for the most promising candidates within 5.0 seconds of timeout compared to its predecessor while decreasing the median value of execution time from 2.79 seconds to 1.06 seconds.
Researcher Affiliation Academia Yutaka Nagashima Yale-NUS College, National University of Singapore University of Innsbruck Czech Institute of Informatics, Robotics, and Cybernetics, Czech Technical University in Prague yutaka@yale-nus.edu.sg
Pseudocode No The paper includes code examples (Program 1, 2, 3) and a flowchart, but no structured pseudocode or algorithm blocks.
Open Source Code Yes users only have to download the relevant Isabelle files from our public Git Hub repository1 and install sem ind using the standard Isabelle command. 1https://github.com/data61/PSL
Open Datasets Yes As our evaluation target, we use 22 Isabelle theory files with 1,095 applications of the induct tactic from the Archive of Formal Proofs (AFP) [Klein et al., 2004].
Dataset Splits No The paper states the total number of problems used for evaluation (1,095 inductive problems from 22 theory files) but does not specify any train/validation/test splits for these problems.
Hardware Specification Yes All evaluations are conducted on a Mac Book Pro (15-inch, 2019) with 2.6 GHz Intel Core i7 6-core memory 32 GB 2400 MHz DDR4.
Software Dependencies No The paper mentions Isabelle/HOL, Isabelle/jEdit, and Se LFi E, but does not provide specific version numbers for these or any other software dependencies used in the experiments.
Experiment Setup Yes sem ind applies 36 pre-defined heuristics written in Se LFi E... sem ind applies 8 pre-defined Se LFi E heuristics to judge the validity of generalisation... Each heuristic is implemented as an assertion on inductive problems and arguments of the induct tactic, and each assertion is tagged with a value... sem ind sums up such points from the 36 heuristics to compute the score for each sequence... selects the five most promising sequences for further processing... presents the 10 most promising sequences... within 5.0 seconds of timeout.