Interpretable Drug Target Prediction Using Deep Neural Representation

Authors: Kyle Yingkai Gao, Achille Fokoue, Heng Luo, Arun Iyengar, Sanjoy Dey, Ping Zhang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally compared our model with matrix factorization, similarity-based methods, and a previous deep learning approach. Overall, the results show that our model outperforms other approaches without requiring domain knowledge and feature engineering.
Researcher Affiliation Industry Kyle Yingkai Gao, Achille Fokoue, Heng Luo, Arun Iyengar, Sanjoy Dey, Ping Zhang IBM Research AI, 1101 Kitchawan Road, Yorktown Heights, NY 10598 kyle.ygao@gmail.com, heng.luo@ibm.com, {achille, aruni, deysa, pzhang}@us.ibm.com
Pseudocode Yes Algorithm 1: Pseudocode of graph CNN.
Open Source Code No The paper does not explicitly state that its source code is open-sourced or provide a direct link to the implementation code. The link provided in footnote 2 (https://github.com/IBM/Interpretable DTIP) is for the dataset used, not the model's source code.
Open Datasets Yes Binding DB [Gilson et al., 2016] is a public, web-accessible database for medicinal chemistry, computational chemistry and systems pharmacology. We took a snapshot of Binding DB that contains 1.3 million data records...By the following criteria we construct a binary classification dataset2 with 39,747 positive examples and 31,218 negative examples. (footnote 2: https://github.com/IBM/Interpretable DTIP)
Dataset Splits Yes We split proteins and drugs into those that should be observed in training and those that should not with four experimental settings; we then allocate DTI pairs into training, development, and testing datasets. Statistics of the datasets are shown in Table 1. Table 1: The number of distinct proteins, drugs, known positive pairs, and known negative pairs of the training, development, and testing datasets. Train... Dev... Test...
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software like RDKit, LIBMF, Tiresias, and scikit-optimize but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes During training, the parameters are initialized randomly from an uniform distribution Θ ( 0.08, 0.08). In each step, with batch size equals 32, a batch of proteins or drugs is randomly selected from the training data. ...we use Adam gradient descent optimization with initial learning rate equals to 0.001 to train the parameters. We train the model for 30 epochs, where each epoch consists of 100 steps. ...The values of hyperparameters of the best model are shown in Table 2, and the best classification boundary is δ = 0.4995. Table 2: Protein Sequence Embedding Size 16, Hidden Dimension 16, Embedding Dropout 0.1, GO Embedding Size 16, Embedding Dropout 0.1, Drug Graph CNN Hidden Dimension 64, Siamese Hidden Size 32, Dropout 0.1, γ 0.0005.