Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Logical Explanations for Deep Relational Machines Using Relevance Information

Authors: Ashwin Srinivasan, Lovekesh Vig, Michael Bain

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we present an empirical evaluation of the predictive and explanatory models, using some benchmark datasets. Appendix C contains details of the domain-speciﬁc relevance information used in the experiments.
Researcher Affiliation	Collaboration	Ashwin Srinivasan EMAIL Department of Computer Sc. & Information Systems BITS Pilani, K.K. Birla Goa Campus, Goa, India Lovekesh Vig EMAIL TCS Research, New Delhi, India Michael Bain EMAIL School of Computer Science and Engineering University of New South Wales, Sydney, Australia.
Pseudocode	Yes	Algorithm 1: A non-deterministic procedure for identifying a single-clause unstructured explanation. Algorithm 2: A non-deterministic procedure for obtaining a structured explanation from an unstructured explanation, by inventing k features. Algorithm 3: Identifying an unstructured explanation with maximal ﬁdelity. Algorithm 4: A procedure for obtaining a structured explanation that is at least as relevant as an unstructured explanation H.
Open Source Code	No	The paper does not contain an explicit statement of code release or a link to a repository for the methodology described. It mentions using existing tools like the Aleph ILP system and Keras/Theano, but not their own implementation's source code.
Open Datasets	Yes	We report results from experiments conducted using 7 well-studied real world problems from the ILP literature. These are: Mutagenesis (King et al., 1996a); Carcinogenesis (King and Srinivasan, 1996a); Dss Tox (Muggleton et al., 2008); and 4 datasets arising from the comparison of Alzheimer s drugs denoted here as Amine, Choline, Scop and Toxic (Srinivasan et al., 1996).
Dataset Splits	Yes	For all the datasets, 10-fold cross-validated estimates of the predictive performance using ILP methods are available in the ILP literature for comparison. We use the same approach. This requires constructing DRMs separately for each of the crossvalidation training sets, and testing them on the corresponding test sets to obtain estimates of the predictive accuracy;...We use the same 10-fold cross-validation strategy for estimating this probability (for eﬃciency, we use the same splits as those used to estimate predictive accuracy).
Hardware Specification	Yes	Random features were constructed on an Intel Core i7 laptop computer, using VMware virtual machine running Fedora 13, with an allocation of 2GB for the virtual machine. ... The deep networks were constructed using the Keras library with Theano as the backend, and were trained using an NVIDIA K-40 GPU card.
Software Dependencies	No	The paper mentions software such as 'VMware virtual machine running Fedora 13', 'Prolog compiler used was Yap', 'Aleph ILP system (Srinivasan, 1999)', 'Keras library with Theano as the backend', and 'Subtle algorithm (Blockeel and Valevich, 2016)'. However, it does not provide specific version numbers for these software components, only publication years for Aleph and Subtle.
Experiment Setup	Yes	We use a straightforward Deep Neural Network (DNN) architecture. There are multiple, fully connected feedforward layers of rectiﬁed linear (Re LU) units followed by Dropout for regularization (see Goodfellow et al. (2016) for a description of these ideas). The model weights were initialized with a Gaussian distribution. The number of layers, number of units for each layer, the optimizers, and other training hyperparameters such as learning rate, were determined via a validation set, which is part of the training data. Since the data is limited for the datasets under consideration, after obtaining the model which yields the best validation score, the chosen model is then retrained on the complete training set (this includes the validation set) until the training loss exceeds the training loss obtained for the chosen model during validation.