reproducibilityindex.ai

Medical Concept Embedding with Time-Aware Attention

Authors: Xiangrui Cai, Jinyang Gao, Kee Yuan Ngiam, Beng Chin Ooi, Ying Zhang, Xiaojie Yuan

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on public and proprietary datasets through clustering and nearest neighbour search tasks demonstrate the effectiveness of our model, showing that it outperforms ﬁve state-of-the-art baselines.
Researcher Affiliation	Collaboration	Xiangrui Cai1 , Jinyang Gao2, Kee Yuan Ngiam3, Beng Chin Ooi2, Ying Zhang1, Xiaojie Yuan1 1 Nankai University, China 2 National University of Singapore, Singapore 3 National University Health System, Singapore {caixiangrui, zhangying, yuanxiaojie}@dbis.nankai.edu.cn, {jinyang.gao, ooibc}@comp.nus.edu.sg, kee_yuan_ngiam@nuhs.edu.sg
Pseudocode	No	Not found.
Open Source Code	Yes	The source code is available at https://github.com/Xiangrui CAI/mce.
Open Datasets	Yes	DE-Syn PUF 4 is a public dataset provided by Centers for Medicare and Medicaid Services (CMS). This dataset contains three years of data (2008-2010), providing inpatient, outpatient and carrier ﬁles, along with the beneﬁciary summary ﬁles. Although some variables have been synthesized to minimize the risk of re-identiﬁcation, the sheer volume of EMR data can still provide useful insights for medical data processing. The diagnosis codes in this dataset follow the ICD-9 standard. 5
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It mentions training with certain parameters but not the data splitting methodology for these sets.
Hardware Specification	No	The paper does not specify any hardware details such as CPU, GPU models, or specific computing infrastructure used for the experiments.
Software Dependencies	No	The paper refers to training models (Skip-gram, CBOW, Glove, wang2vec, med2vec) using their source code, but does not specify any software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup	Yes	For preprocessing the datasets, medical concepts occurred less than 5 times are discarded for all models. During training, the rejection threshold is 10 4. We use the same negative sampling estimation for CBOW, Skip-gram, wang2vec and our model, and the number of negative samples is 5. The start learning rate is set as 0.05 and 0.025 for Skip-gram and the CBOW-based models respectively. For the proposed model, we set time unit as one week. We note that the maximum number of contexts in our model is set as twice as the context window size W in the baselines, within which the number of contexts is typically 2W. All models are trained with 30 epochs for NUH2012 and 5 epochs for DE-Syn PUF. The dimension of medical concept vectors is 100 for all models.