Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Ontology Reasoning with Deep Neural Networks

Authors: Patrick Hohenecker, Thomas Lukasiewicz

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the outcomes of several experiments, which show that our model is able to learn to perform highly accurate ontology reasoning on very large, diverse, and challenging benchmarks. Furthermore, it turned out that the suggested approach suffers much less from diﬀerent obstacles that prohibit logic-based symbolic reasoning, and, at the same time, is surprisingly plausible from a biological point of view.
Researcher Affiliation	Academia	Patrick Hohenecker EMAIL Department of Computer Science University of Oxford, UK Thomas Lukasiewicz EMAIL Department of Computer Science University of Oxford, UK
Pseudocode	Yes	Algorithm 1: Generating individual embeddings. Input: an ontological knowledge base KB = Σ, D with individuals(KB) = {i1, i2, . . . , i M}, a number of update iterations N, and (optionally) a matrix of initial embeddings E. Output: the generated embeddings E.
Open Source Code	Yes	the code that we used to generate our toy datasets, including the employed formal ontologies, is available as open source from https://github.com/phohenecker/familytree-data-gen and https://github.com/phohenecker/country-data-gen, respectively.
Open Datasets	Yes	All the datasets that have been used in our experiments are available from https://paho.at/rrn.
Dataset Splits	Yes	Finally, notice that we partitioned each of our datasets into pairwise disjoint sets for training and testing, which ensures that the model actually learns to perform ontology reasoning as opposed to just memorizing the data. Also, we would like to point out that, unfortunately, Ebrahimi et al. (2018), who cited our work, mistakenly claimed that we trained and evaluated our models on one and the same data, which is not the case.
Hardware Specification	No	However, for our experiments, we used a straightforward CPUonly implementation of the RRN model, which did not make use of any optimization or parallelization strategies.
Software Dependencies	No	No specific version numbers for software libraries or frameworks are provided. The text mentions "Adam (Kingma & Ba, 2015)" as an optimization method, but not as a software dependency with a version.
Experiment Setup	Yes	As part of the conducted experiments, we performed a grid search in order to determine an appropriate set of hyperparameters, all of which are reported in Table 6. Interestingly, the RRN seems to be broadly task-agnostic, as similar values worked well for all the considered datasets. Merely the size of the individual embeddings as well as the number of update iterations had to be adjusted based on the respective reasoning task. Furthermore, there was no need to manually create mini-batches of training data, as the single training samples contained numerous triples for each class and relation type already. All our models were trained by means of Adam (Kingma & Ba, 2015) with an initial learning rate of 0.001, β1 set to 0.9, and β2 set to 0.999. Furthermore, all MLPs had a single hidden layer of Re LU units and sigmoid units on the output layers. The sizes of the hidden layers were set to the average of input and output sizes for all of them. To prevent overﬁtting, we employed weight decay, which means that we added the term λ θ 2 to the computed loss, where θ represents a vector of all model parameters, and λ R 0 is a hyperparameter.