Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation
Authors: Cao Xiao, Ping Zhang, W. Chaovalitwongse, Jianying Hu, Fei Wang
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real world data show our models achieved higher prediction accuracy and shorter running time than the state-of-the-art alternatives. |
| Researcher Affiliation | Collaboration | Cao Xiao,* Ping Zhang,* W. Art Chaowalitwongse, Jianying Hu,* Fei Wang *IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 University of Arkansas, Fayetteville, AK 72701 Healthcare Policy and Research, Weill Cornell Medical College, Cornell University, New York, NY 10065 |
| Pseudocode | Yes | procedure THE TRAINING PROCEDURE {Doc1, . . . , Doc D} drug document K number of ADR topics train LDA({Doc1, . . . , Doc D}, K) β ADR distribution of ADR topics Θ ADR topic distribution X = { xd} drug structure features train θd xd with lasso. end procedure procedure THE PREDICTION PROCEDURE x d drug structure features for target drug d predict topic distribution θd with lasso. β ADR distribution of ADR topics predict ADRs using β and θd with LDA end procedure procedure THE GENERATION PROCESS OF LDA FOR DRUG DOCUMENT d index of drug D number of drugs in the corpus for d [1, . . . , D] do draw a topic mixture θd such that p(θ|α) = Dirichlet(α). end for n index of ADR N number of candidate ADRs in the ADR corpus for n [1, . . . , N] do draw a topic zn Multinomial(θ). draw a word wn from p(wn|zn, β), a multinomial conditioned on topic zn end for end procedure |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | We used ADRe CS database (Cai et al. 2015) in evaluation. The drug-ADR information of ADRe CS was mainly extracted from the drug labels in the Daily Med1, a website managed by the U.S. National Library of Medicine (NLM) to provide comprehensive information about marketed drugs. |
| Dataset Splits | Yes | We performed 20-fold cross validation to evaluate our models against baseline methods including lasso and CCA. Specifically, in each iteration, 95% of the training drug was used to construct the models and the remaining 5% of the drug was used for performance testing. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | Yes | We processed the data using Python package pandas (Mc Kinney 2015) , as well as evaluated the algorithms using R packages glmnet (Friedman, Hastie, and Tibshirani 2010) , cca (Gonz alez et al. 2008) , and lda (Chang 2015) , respectively. |
| Experiment Setup | Yes | We also tune the number of topics K from 20 to 140 with 20 per increment, and select one that gives the best performance. In all cases, we have K = 100. |