reproducibilityindex.ai

CASTER: Predicting Drug Interactions with Chemical Substructure Representation

Authors: Kexin Huang, Cao Xiao, Trong Hoang, Lucas Glass, Jimeng Sun702-709

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated CASTER on two real-world DDI datasets and showed that it performed better than stateof-the-art baselines and provided interpretable predictions.
Researcher Affiliation	Collaboration	1Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA, USA 2Analytic Center of Excellence, IQVIA, Cambridge, MA, USA 3MIT-IBM Watson AI Lab, IBM Research, Cambridge, MA, USA 4Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Pseudocode	Yes	Algorithm 1: The Chemical Sequential Pattern Mining Algorithm Initialize V to the set of all atoms and bonds, W as the set of tokenized SMILES strings input Input η as the practitioner-speciﬁed frequency threshold, and ℓas the maximum size of V for t = 1 . . . ℓdo (A, B), FREQ scan W //(A,B), FREQ are the frequentest pair and its frequency if FREQ < η then break // frequency lower than threshold end W ﬁnd(A, B) W, replace with (AB) // update W with the new token (AB) V V (AB) // add (AB) to the vocabulary set V end
Open Source Code	Yes	Source code is at https://github.com/kexinhuang12345/ CASTER
Open Datasets	Yes	We evaluated CASTER using two datasets. (1) Drug Bank (DDI) (Wishart et al. 2008) includes 1,850 approved drugs... (2) BIOSNAP (Marinka Zitnik and Leskovec 2018) that consists of 1,322 approved drugs with 41,520 labelled DDIs, obtained through drug labels and scientiﬁc publications.
Dataset Splits	Yes	We randomly divided the dataset into training, validation and testing sets in a 7:1:2 ratio.
Hardware Specification	Yes	For training, we use a server with 2 Intel Xeon E5-2670v2 2.5GHz CPUs, 128 GB RAM and 3 NVIDIA Tesla K80 GPUs.
Software Dependencies	No	All methods are implemented in Py Torch (Paszke et al. 2017). For preprocessing of the SMILES string, we use the RDKit software to convert all string into canonical form.
Experiment Setup	Yes	CASTER Setup. We found the following hyperparameters a best ﬁt to CASTER. For SPM, we set the frequency threshold η as 50 and it result in k = 1, 722 frequent substructures. The dimension of latent representation from encoder d is set to be 50. The encoder and decoder both use three layer perceptrons of hidden size 500 with Re LU activation function. The predictor uses six layer perceptrons with ﬁrst three layers of hidden size 1,024 and last layer 64. The predictor applies 1-D batch normalization with Re LU activation units. For the trade-off between different loss, we set α = 1e 1, β = 1e 1, γ = 1. For regularization coefﬁcients. we set λ1 = 1e 5, λ2 = 1e 1. We ﬁrst pretrain the network for one epoch with the unlabelled dataset and then proceed to the labelled dataset training process. We use batch size 256 with Adam optimizer of learning rate 1e 3. All methods are implemented in Py Torch (Paszke et al. 2017).