Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Logical Explanations for Deep Relational Machines Using Relevance Information
Authors: Ashwin Srinivasan, Lovekesh Vig, Michael Bain
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we present an empirical evaluation of the predictive and explanatory models, using some benchmark datasets. Appendix C contains details of the domain-specific relevance information used in the experiments. |
| Researcher Affiliation | Collaboration | Ashwin Srinivasan EMAIL Department of Computer Sc. & Information Systems BITS Pilani, K.K. Birla Goa Campus, Goa, India Lovekesh Vig EMAIL TCS Research, New Delhi, India Michael Bain EMAIL School of Computer Science and Engineering University of New South Wales, Sydney, Australia. |
| Pseudocode | Yes | Algorithm 1: A non-deterministic procedure for identifying a single-clause unstructured explanation. Algorithm 2: A non-deterministic procedure for obtaining a structured explanation from an unstructured explanation, by inventing k features. Algorithm 3: Identifying an unstructured explanation with maximal fidelity. Algorithm 4: A procedure for obtaining a structured explanation that is at least as relevant as an unstructured explanation H. |
| Open Source Code | No | The paper does not contain an explicit statement of code release or a link to a repository for the methodology described. It mentions using existing tools like the Aleph ILP system and Keras/Theano, but not their own implementation's source code. |
| Open Datasets | Yes | We report results from experiments conducted using 7 well-studied real world problems from the ILP literature. These are: Mutagenesis (King et al., 1996a); Carcinogenesis (King and Srinivasan, 1996a); Dss Tox (Muggleton et al., 2008); and 4 datasets arising from the comparison of Alzheimer s drugs denoted here as Amine, Choline, Scop and Toxic (Srinivasan et al., 1996). |
| Dataset Splits | Yes | For all the datasets, 10-fold cross-validated estimates of the predictive performance using ILP methods are available in the ILP literature for comparison. We use the same approach. This requires constructing DRMs separately for each of the crossvalidation training sets, and testing them on the corresponding test sets to obtain estimates of the predictive accuracy;...We use the same 10-fold cross-validation strategy for estimating this probability (for efficiency, we use the same splits as those used to estimate predictive accuracy). |
| Hardware Specification | Yes | Random features were constructed on an Intel Core i7 laptop computer, using VMware virtual machine running Fedora 13, with an allocation of 2GB for the virtual machine. ... The deep networks were constructed using the Keras library with Theano as the backend, and were trained using an NVIDIA K-40 GPU card. |
| Software Dependencies | No | The paper mentions software such as 'VMware virtual machine running Fedora 13', 'Prolog compiler used was Yap', 'Aleph ILP system (Srinivasan, 1999)', 'Keras library with Theano as the backend', and 'Subtle algorithm (Blockeel and Valevich, 2016)'. However, it does not provide specific version numbers for these software components, only publication years for Aleph and Subtle. |
| Experiment Setup | Yes | We use a straightforward Deep Neural Network (DNN) architecture. There are multiple, fully connected feedforward layers of rectified linear (Re LU) units followed by Dropout for regularization (see Goodfellow et al. (2016) for a description of these ideas). The model weights were initialized with a Gaussian distribution. The number of layers, number of units for each layer, the optimizers, and other training hyperparameters such as learning rate, were determined via a validation set, which is part of the training data. Since the data is limited for the datasets under consideration, after obtaining the model which yields the best validation score, the chosen model is then retrained on the complete training set (this includes the validation set) until the training loss exceeds the training loss obtained for the chosen model during validation. |