Learning MDL Logic Programs from Noisy Data

Authors: Céline Hocquette, Andreas Niskanen, Matti Järvisalo, Andrew Cropper

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on several domains, including drug design, game playing, and program synthesis, show that our approach can outperform existing approaches in terms of predictive accuracies and scale to moderate amounts of noise.
Researcher Affiliation Academia Celine Hocquette1, Andreas Niskanen2, Matti J arvisalo2, Andrew Cropper1 1University of Oxford 2University of Helsinki celine.hocquette@cs.ox.ac.uk, andreas.niskanen@helsinki.fi, matti.jarvisalo@helsinki.fi, andrew.cropper@cs.ox.ac.uk
Pseudocode Yes Algorithm 1: MAXSYNTH 1 def maxsynth(bk, pos, neg): 2 cons, promising, best_solution = {}, {}, {} 3 size, max_mdl = 1, len(pos) 4 while size max_mdl: 5 h = generate(cons, size) 6 if h == UNSAT: 7 size += 1 8 continue 9 tp, fn, fp = test(pos, neg, bk, h) 10 h_mdl = fn+fp+size(h) 11 if h_mdl < max_mdl: 12 best_solution = h 13 max_mdl = h_mdl-1 14 if tp>0 and not_rec(h) and not_pi(h): 15 promising += h 16 combi = combine(promising, max_mdl) 17 if combi != UNSAT: 18 best_solution = combi 19 tp, fn, fp = test(pos, neg, bk, combi) 20 max_mdl = fn+fp+size(combi)-1 21 cons += constrain(h, fn, fp) 22 return best_solution
Open Source Code Yes The experimental code and data are available at https://github.com/ celinehocquette/aaai24-maxsynth.
Open Datasets Yes IGGP. The goal of inductive general game playing (Cropper, Evans, and Law 2020) (IGGP) is to induce rules to explain game traces from the general game playing competition (Genesereth and Bj ornsson 2013). Program synthesis. We use a program synthesis dataset (Cropper and Morel 2021). Zendo. Zendo is an inductive game where the goal is to find a rule by building structures of pieces. The game interests cognitive scientists (Bramley et al. 2018). Alzheimer. These real-world tasks (King, Sternberg, and Srinivasan 1995) involve learning rules describing four properties desirable for drug design against Alzheimer s disease. Wn18RR. Wn18rr (Bordes et al. 2013) is a real-world knowledge base with 11 relations from Word Net.
Dataset Splits No The paper mentions evaluating on "training examples" and "unseen test data" and adding noise to "training examples," but it does not specify explicit dataset split ratios or methodologies (e.g., 80/20 split, random seed, k-fold cross-validation).
Hardware Specification Yes We use an 8-Core 3.2 GHz Apple M1 and a single CPU.
Software Dependencies No The paper states "MAXSYNTH uses the UWr Max Sat solver (Piotr ow 2020) in the combine stage" and "POPPER uses Clingo (Gebser et al. 2014)," but it does not provide explicit version numbers for these software dependencies or any other ancillary software.
Experiment Setup No The paper states "We measure predictive accuracy... and learning time given a maximum learning time of 20 minutes," but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed model training configurations.