Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Authors: Zhaoyue Sun, Jiazheng Li, Gabriele Pergola, Yulan He

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive experiments in both transductive and inductive settings to meet the needs of application scenarios. We propose and evaluate the performance of the Ex DDI family methods for DDI explanation generation... Our experiments reveal that top-performed fine-tuning methods can effectively capture molecular similarities and generate accurate explanations in the transductive setting. Table 1 presents the evaluation results of the models in explanation generation. Figure 3 presents the performance of different models in the DDI binary prediction task.
Researcher Affiliation	Academia	1Department of Computer Science, University of Warwick 2Department of Informatics, King s College London 3The Alan Turing Institute
Pseudocode	No	The paper describes the methods used (e.g., Ex DDI-S2S, Ex DDI-MT, Ex DDI-MTS, Retrieval-based Method, LLM-based In-Context Prompting) in descriptive text and uses mathematical formulas for loss functions, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and Data https://github.com/Zhaoyue Sun/Ex DDI
Open Datasets	Yes	We evaluate the model performance based on two databases: DDInter (Xiong et al. 2022) and Drug Bank (v5.1.10). Drug Bank is a widely used resource for training and evaluating DDI prediction models... As a valuable resource, Xiong et al. (2022) constructed the DDInter database, gathering information on 1.8k approved drugs and 0.24M associated DDIs, along with detailed explanations.
Dataset Splits	Yes	For transductive setting, we randomly divided all positive and negative samples into training/validation/test sets with a ratio of 0.7/0.1/0.2... We conduct 5-fold cross-validation for all settings. For inductive setting, we evaluate the model s performance not only on unknown DDIs but also on unknown drugs. Specifically, the test set is split into inductive S1 and inductive S2 subsets according to whether both drugs are unavailable in the training set or only one drug is unavailable in the training set. We first divided drugs into three sets, M1, M2, and M3, with proportions of 0.75/0.05/0.2. Then, the training set consists of DDI samples where both drugs in the queried drug pair are from M1; The validation set includes samples where both drugs are from M2, or one is from M2 and the other is from M1; The inductive S1 test set contains samples where both drugs are from M3, and the inductive S2 test set contains samples where one drug is from M1 and the other is from M3.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. It only refers to 'running the model' or 'training'.
Software Dependencies	No	The paper mentions 'RDKit' for extracting MACCS keys and 'Mol T5' as the backbone encoder-decoder, and 'Chat GPT (Open AI 2022)' for LLM-based prompting. However, it does not provide specific version numbers for these software components, which is required for a reproducible description of ancillary software.
Experiment Setup	No	The paper states: 'For hyper-parameter selection and training details, please refer to Appendix B (Sun et al. 2024).' While it refers to an appendix for details, the main text does not explicitly provide concrete hyperparameter values or detailed training configurations itself.