Learning Logic Programs by Discovering Higher-Order Abstractions

Authors: Céline Hocquette, Sebastijan Dumancic, Andrew Cropper

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on multiple domains, including program synthesis and visual reasoning, show that refactoring can improve the learning performance of an inductive logic programming system, specifically improving predictive accuracies by 27% and reducing learning times by 47%. We also show that STEVIE can discover abstractions that transfer to multiple domains.
Researcher Affiliation Academia 1University of Oxford 2TU Delft
Pseudocode Yes Algorithm 1 STEVIE
Open Source Code Yes The experimental code and data are available at https://github.com/ celinehocquette/ijcai24-stevie.
Open Datasets Yes We use a dataset of 176 program synthesis tasks and reserve 25% as held-out tasks. The experimental code and data are available at https://github.com/ celinehocquette/ijcai24-stevie.
Dataset Splits No The paper mentions reserving 25% of tasks as 'held-out tasks' for testing but does not explicitly specify separate training, validation, and test splits with percentages or counts for reproduction beyond the held-out tasks.
Hardware Specification Yes We use a c6a AWS instance with 32v CPU and 64GB of memory.
Software Dependencies Yes We use SWI-Prolog to execute the programs learned by STEVIE and HOPPER. STEVIE uses the CP-SAT solver [Perron and Furnon, 2019].
Experiment Setup Yes We set HOPPER to use at most three abstractions in a program. We allow HOPPER to use three threads. We allow STEVIE to discover abstractions with at most three higher-order variables. STEVIE uses the CP-SAT solver [Perron and Furnon, 2019]. STEVIE uses a single CPU. We use a c6a AWS instance with 32v CPU and 64GB of memory. We use a maximum learning time of 15 minutes per task and return the best solution found by HOPPER in this time limit. We use a timeout of 1 hour for STEVIE and return the best refactoring found in this time limit. We repeat all the experiments 5 times and calculate the mean and standard error.