Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation

Authors: Cao Xiao, Ping Zhang, W. Chaovalitwongse, Jianying Hu, Fei Wang

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on real world data show our models achieved higher prediction accuracy and shorter running time than the state-of-the-art alternatives.
Researcher Affiliation Collaboration Cao Xiao,* Ping Zhang,* W. Art Chaowalitwongse, Jianying Hu,* Fei Wang *IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 University of Arkansas, Fayetteville, AK 72701 Healthcare Policy and Research, Weill Cornell Medical College, Cornell University, New York, NY 10065
Pseudocode Yes procedure THE TRAINING PROCEDURE {Doc1, . . . , Doc D} drug document K number of ADR topics train LDA({Doc1, . . . , Doc D}, K) β ADR distribution of ADR topics Θ ADR topic distribution X = { xd} drug structure features train θd xd with lasso. end procedure procedure THE PREDICTION PROCEDURE x d drug structure features for target drug d predict topic distribution θd with lasso. β ADR distribution of ADR topics predict ADRs using β and θd with LDA end procedure procedure THE GENERATION PROCESS OF LDA FOR DRUG DOCUMENT d index of drug D number of drugs in the corpus for d [1, . . . , D] do draw a topic mixture θd such that p(θ|α) = Dirichlet(α). end for n index of ADR N number of candidate ADRs in the ADR corpus for n [1, . . . , N] do draw a topic zn Multinomial(θ). draw a word wn from p(wn|zn, β), a multinomial conditioned on topic zn end for end procedure
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We used ADRe CS database (Cai et al. 2015) in evaluation. The drug-ADR information of ADRe CS was mainly extracted from the drug labels in the Daily Med1, a website managed by the U.S. National Library of Medicine (NLM) to provide comprehensive information about marketed drugs.
Dataset Splits Yes We performed 20-fold cross validation to evaluate our models against baseline methods including lasso and CCA. Specifically, in each iteration, 95% of the training drug was used to construct the models and the remaining 5% of the drug was used for performance testing.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies Yes We processed the data using Python package pandas (Mc Kinney 2015) , as well as evaluated the algorithms using R packages glmnet (Friedman, Hastie, and Tibshirani 2010) , cca (Gonz alez et al. 2008) , and lda (Chang 2015) , respectively.
Experiment Setup Yes We also tune the number of topics K from 20 to 140 with 20 per increment, and select one that gives the best performance. In all cases, we have K = 100.