Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neuro-Symbolic Class Expression Learning
Authors: Caglar Demir, Axel-Cyrille Ngonga Ngomo
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on 4 benchmark datasets and 390 learning problems suggest that DRILL converges to goal states at least 2.7 times faster than state-of-the-art models on all learning problems. The results of our statistical significance test confirms that DRILL converges to goal states significantly faster (p-value < 1%) than state-of-the-art models on all benchmark datasets. |
| Researcher Affiliation | Academia | Caglar Demir , Axel-Cyrille Ngonga Ngomo Data Science Research Group, Paderborn University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 DRILL with deep Q-learning training procedure |
| Open Source Code | Yes | We provide an open-source implementation of DRILL, including pre-trained models, training and evaluation scripts. 1https://github.com/dice-group/DRILL |
| Open Datasets | Yes | We used four benchmark datasets (Family, Carcinogenesis, Mutagenesis and Biopax) [Bin et al., 2016; Fanizzi et al., 2018]. |
| Dataset Splits | No | The paper discusses training procedures and evaluation on benchmark datasets and randomly generated learning problems, but it does not specify explicit train/validation/test splits with percentages or sample counts for any of these datasets for general reproducibility. It mentions random undersampling for training data generation but not a standard validation split. |
| Hardware Specification | No | The paper mentions 'multi-CPUs or -GPUs' in the context of leveraging modern parallel compute architectures and 'single CPU' for state-of-the-art models, but it does not provide specific models (e.g., NVIDIA A100, Intel Xeon) or detailed hardware specifications used for their experiments. |
| Software Dependencies | No | The paper mentions using components like 'dice-embeddings framework' and adapting 'deep Q-Network proposed in [Mnih et al., 2015]' but does not provide specific version numbers for any software, libraries, or frameworks used in their experimental setup. |
| Experiment Setup | Yes | Throughout our experiments, DRILL was trained with a fixed configuration: 32 input channels, (3x3) kernel. In our experiments, we set maxlen = 5. We set the maximum runtime to 3 seconds as models often reach good solutions within 1.5 seconds [Lehmann and Hitzler, 2010]. Approaches were configured to terminate as soon as they found a goal state (i.e., a state with F1-score = 1.0). |