reproducibilityindex.ai

Learning Logic Programs by Discovering Higher-Order Abstractions

Authors: Céline Hocquette, Sebastijan Dumancic, Andrew Cropper

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on multiple domains, including program synthesis and visual reasoning, show that refactoring can improve the learning performance of an inductive logic programming system, specifically improving predictive accuracies by 27% and reducing learning times by 47%. We also show that STEVIE can discover abstractions that transfer to multiple domains.
Researcher Affiliation	Academia	1University of Oxford 2TU Delft
Pseudocode	Yes	Algorithm 1 STEVIE
Open Source Code	Yes	The experimental code and data are available at https://github.com/ celinehocquette/ijcai24-stevie.
Open Datasets	Yes	We use a dataset of 176 program synthesis tasks and reserve 25% as held-out tasks. The experimental code and data are available at https://github.com/ celinehocquette/ijcai24-stevie.
Dataset Splits	No	The paper mentions reserving 25% of tasks as 'held-out tasks' for testing but does not explicitly specify separate training, validation, and test splits with percentages or counts for reproduction beyond the held-out tasks.
Hardware Specification	Yes	We use a c6a AWS instance with 32v CPU and 64GB of memory.
Software Dependencies	Yes	We use SWI-Prolog to execute the programs learned by STEVIE and HOPPER. STEVIE uses the CP-SAT solver [Perron and Furnon, 2019].
Experiment Setup	Yes	We set HOPPER to use at most three abstractions in a program. We allow HOPPER to use three threads. We allow STEVIE to discover abstractions with at most three higher-order variables. STEVIE uses the CP-SAT solver [Perron and Furnon, 2019]. STEVIE uses a single CPU. We use a c6a AWS instance with 32v CPU and 64GB of memory. We use a maximum learning time of 15 minutes per task and return the best solution found by HOPPER in this time limit. We use a timeout of 1 hour for STEVIE and return the best refactoring found in this time limit. We repeat all the experiments 5 times and calculate the mean and standard error.