Medical Concept Embedding with Time-Aware Attention
Authors: Xiangrui Cai, Jinyang Gao, Kee Yuan Ngiam, Beng Chin Ooi, Ying Zhang, Xiaojie Yuan
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on public and proprietary datasets through clustering and nearest neighbour search tasks demonstrate the effectiveness of our model, showing that it outperforms five state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Xiangrui Cai1 , Jinyang Gao2, Kee Yuan Ngiam3, Beng Chin Ooi2, Ying Zhang1, Xiaojie Yuan1 1 Nankai University, China 2 National University of Singapore, Singapore 3 National University Health System, Singapore {caixiangrui, zhangying, yuanxiaojie}@dbis.nankai.edu.cn, {jinyang.gao, ooibc}@comp.nus.edu.sg, kee_yuan_ngiam@nuhs.edu.sg |
| Pseudocode | No | Not found. |
| Open Source Code | Yes | The source code is available at https://github.com/Xiangrui CAI/mce. |
| Open Datasets | Yes | DE-Syn PUF 4 is a public dataset provided by Centers for Medicare and Medicaid Services (CMS). This dataset contains three years of data (2008-2010), providing inpatient, outpatient and carrier files, along with the beneficiary summary files. Although some variables have been synthesized to minimize the risk of re-identification, the sheer volume of EMR data can still provide useful insights for medical data processing. The diagnosis codes in this dataset follow the ICD-9 standard. 5 |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions training with certain parameters but not the data splitting methodology for these sets. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU, GPU models, or specific computing infrastructure used for the experiments. |
| Software Dependencies | No | The paper refers to training models (Skip-gram, CBOW, Glove, wang2vec, med2vec) using their source code, but does not specify any software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | For preprocessing the datasets, medical concepts occurred less than 5 times are discarded for all models. During training, the rejection threshold is 10 4. We use the same negative sampling estimation for CBOW, Skip-gram, wang2vec and our model, and the number of negative samples is 5. The start learning rate is set as 0.05 and 0.025 for Skip-gram and the CBOW-based models respectively. For the proposed model, we set time unit as one week. We note that the maximum number of contexts in our model is set as twice as the context window size W in the baselines, within which the number of contexts is typically 2W. All models are trained with 30 epochs for NUH2012 and 5 epochs for DE-Syn PUF. The dimension of medical concept vectors is 100 for all models. |