reproducibilityindex.ai

A Novel Sequence-to-Subgraph Framework for Diagnosis Classification

Authors: Jun Chen, Quan Yuan, Chao Lu, Haifeng Huang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The evaluation conducted on both the real-world English and Chinese datasets shows that the proposed method outperforms the state-of-the-art deep learning based diagnosis classiﬁcation models.
Researcher Affiliation	Industry	Jun Chen , Quan Yuan , Chao Lu and Haifeng Huang Baidu Inc, Beijing 100193, China {chenjun22, yuanquan02, luchao, huanghaifeng}@baidu.com
Pseudocode	No	The paper describes the model architecture and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	No explicit statement or link providing access to the open-source code for the proposed methodology (SHi DAN) was found. A link is provided for the LPA algorithm, which is a third-party tool used.
Open Datasets	Yes	MIMIC-III-50: A public English EMR dataset consisting of the Top-50 frequent diagnosis codes4 [Mullenbach et al., 2018]. Each EMR has one or more diagnosis codes. Thus, we use MIMIC-III-50 to evaluate the performance of the proposed method on multi-label classiﬁcation. 4https://github.com/jamesmullenbach/caml-mimic
Dataset Splits	No	The paper does not explicitly state the training, validation, and test splits used for the datasets (e.g., percentages or sample counts for each split). While it mentions datasets like MIMIC-III-50 which has standard splits, it does not confirm if those specific splits were used here.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using existing packages like CliNER and an LPA algorithm, but it does not specify version numbers for any of the software dependencies or libraries used.
Experiment Setup	Yes	By default, the number of dimensions of word embeddings and entity embeddings is 100. The number of dimensions of latent feature m is 128. The dropout rate is empirically 0.2. On MIMIC-III-50, each model is trained 12 epochs with batch size 16, and the maximum number of subgraphs K = 15. On CHS-AD-200, each model is trained 35 epochs with batch size 64, and the maximum number of subgraphs K = 6.