A Novel Sequence-to-Subgraph Framework for Diagnosis Classification

Authors: Jun Chen, Quan Yuan, Chao Lu, Haifeng Huang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The evaluation conducted on both the real-world English and Chinese datasets shows that the proposed method outperforms the state-of-the-art deep learning based diagnosis classification models.
Researcher Affiliation Industry Jun Chen , Quan Yuan , Chao Lu and Haifeng Huang Baidu Inc, Beijing 100193, China {chenjun22, yuanquan02, luchao, huanghaifeng}@baidu.com
Pseudocode No The paper describes the model architecture and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No No explicit statement or link providing access to the open-source code for the proposed methodology (SHi DAN) was found. A link is provided for the LPA algorithm, which is a third-party tool used.
Open Datasets Yes MIMIC-III-50: A public English EMR dataset consisting of the Top-50 frequent diagnosis codes4 [Mullenbach et al., 2018]. Each EMR has one or more diagnosis codes. Thus, we use MIMIC-III-50 to evaluate the performance of the proposed method on multi-label classification. 4https://github.com/jamesmullenbach/caml-mimic
Dataset Splits No The paper does not explicitly state the training, validation, and test splits used for the datasets (e.g., percentages or sample counts for each split). While it mentions datasets like MIMIC-III-50 which has standard splits, it does not confirm if those specific splits were used here.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using existing packages like CliNER and an LPA algorithm, but it does not specify version numbers for any of the software dependencies or libraries used.
Experiment Setup Yes By default, the number of dimensions of word embeddings and entity embeddings is 100. The number of dimensions of latent feature m is 128. The dropout rate is empirically 0.2. On MIMIC-III-50, each model is trained 12 epochs with batch size 16, and the maximum number of subgraphs K = 15. On CHS-AD-200, each model is trained 35 epochs with batch size 64, and the maximum number of subgraphs K = 6.