Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pay-As-You-Go Description Logic Reasoning by Coupling Tableau and Saturation Procedures

Authors: Andreas Steigmiller, Birte Glimm

JAIR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our detailed evaluation shows that this combination significantly improves the reasoning performance for a wide range of ontologies. ... We present the results of a detailed evaluation including comparisons with other approaches and state-of-the-art reasoners (Section 7)... The evaluation was carried out on a Dell Power Edge R420 server running with two Intel Xeon E5-2440 hexa core processors at 2.4 GHz with Hyper-Threading and 144 GB RAM under a 64bit Ubuntu 12.04.2 LTS. Our evaluation focuses on classification, which is a central reasoning task supported by many reasoners and, thus, it is ideal for comparing results. In principle, we only measured the classification time...
Researcher Affiliation Academia Andreas Steigmiller EMAIL Birte Glimm EMAIL Institute of Artificial Intelligence, University of Ulm, Germany
Pseudocode Yes The tableau algorithm works by decomposing/unfolding concepts in the completion graph with a set of expansion rules (see Table 1). Each rule application can add new concepts to node labels and/or new nodes and edges to the completion graph... Table 1: Tableau expansion rules for normalised SROIQ knowledge bases. ...Table 2: Saturation rules for (partially) handling SROIQ knowledge bases. ...Table 3: Rules for detecting nodes with tight at-most restrictions and nodes with nominal dependency in the saturation graph. ...Table 4: Rules for detecting incompleteness in the saturation graph.
Open Source Code Yes We extended Konclude (Steigmiller, Liebig, & Glimm, 2014) with the saturation procedure shown in Section 3 and with the optimisations presented in Section 4 and 5. ... Konclude is freely available at http://www.konclude.com/ ... The test corpus and the evaluated version(s) of Konclude v0.6.1 can be found online at http://www.derivo.de/en/products/konclude/paper-support-pages/tableau-saturation-coupling.html
Open Datasets Yes The evaluation uses a large test corpus of ontologies, which has been obtained by collecting all downloadable and parsable ontologies from the Gardiner ontology suite (Gardiner, Horrocks, & Tsarkov, 2006), the NCBO Bio Portal (Whetzel et al., 2011), the National Cancer Institute thesaurus (NCIt) archive (National Cancer Institute, 2003), the Open Biological Ontologies (OBO) Foundry (Smith et al., 2007), the Oxford ontology library (Information Systems Group, 2012)... the ORE2014 dataset (Matentzoglu & Parsia, 2014).
Dataset Splits No The paper mentions using a 'test corpus' of ontologies and 'selected benchmark ontologies' for evaluation but does not describe any specific training/test/validation splits for these datasets. It refers to 'classification' as the task. For example, 'Our evaluation focuses on classification, which is a central reasoning task supported by many reasoners and, thus, it is ideal for comparing results.'
Hardware Specification Yes The evaluation was carried out on a Dell Power Edge R420 server running with two Intel Xeon E5-2440 hexa core processors at 2.4 GHz with Hyper-Threading and 144 GB RAM under a 64bit Ubuntu 12.04.2 LTS.
Software Dependencies Yes We extended Konclude (Steigmiller, Liebig, & Glimm, 2014) with the saturation procedure shown in Section 3... Konclude is freely available at http://www.konclude.com/ ... The test corpus and the evaluated version(s) of Konclude v0.6.1 can be found online at http://www.derivo.de/en/products/konclude/paper-support-pages/tableau-saturation-coupling.html. ... We used the OWL API for parsing... We used MORe in combination with ELK 0.4.1 (Kazakov, KrötZsch, & Simančik, 2014) and HermiT 1.3.8, but other combinations are also possible... FaCT++ 1.6.3 (Tsarkov & Horrocks, 2006), HermiT 1.3.8 (Glimm, Horrocks, Motik, Stoilos, & Wang, 2014), MORe 0.1.6 (Armas Romero et al., 2012), and Pellet 2.3.1 (Sirin, Parsia, Cuenca Grau, Kalyanpur, & Katz, 2007).
Experiment Setup Yes For the evaluation of the ontology repositories, we used the time limit of 5 minutes. For the selected benchmark ontologies, we cancelled the classification task after 15 minutes... we averaged the results for the selected benchmark ontologies over 3 separate runs... we configured all reasoners to use only one worker thread, which allows for a comparison independently of the number of CPU cores and facilitates the presentation of the improvements through saturation.