reproducibilityindex.ai

When Radiology Report Generation Meets Knowledge Graph

Authors: Yixiao Zhang, Xiaosong Wang, Ziyue Xu, Qihang Yu, Alan Yuille, Daguang Xu12910-12917

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones. We evaluate our work using the publicly accessible IU-RR dataset (Demner-Fushman et al. 2015). The performance of our model in both classiﬁcation and report generation tasks is compared with previous arts in both quantitative and qualitative manner. In classiﬁcation, our model performs better in most of the categories and achieves 2% Area Under Curve (AUC) improvement on average. In report generation, our model obtains better or equivalent performance in conventional evaluation metrics, and at the meantime scores significantly higher in the proposed MIRQI metrics. In this section, we report several experiments that explored and validated the advantage of including graph embedding module in radiology abnormality classiﬁcation and report generation.
Researcher Affiliation	Collaboration	Yixiao Zhang,1 Xiaosong Wang,2 Ziyue Xu,2 Qihang Yu,1 Alan Yuille,1 Daguang Xu2 1Department of Computer Science, Johns Hopkins University, Baltimore, USA 2NVIDIA Corporation, Bethesda, USA
Pseudocode	No	The paper includes mathematical formulas and descriptions of the model, but no formal pseudocode block or algorithm.
Open Source Code	No	Importantly, we will make our code (both the model and metrics) and data split public available to promote a fair comparison for the future evaluation.
Open Datasets	Yes	We evaluate our work using the publicly accessible IU-RR dataset (Demner-Fushman et al. 2015). The dataset contains 3955 radiology reports, each associated with one frontal view chest x-ray image and optionally one lateral view image.
Dataset Splits	Yes	To evaluate our models, we employed stratiﬁed ﬁve-fold cross validation which ensures that the number of samples in each fold is roughly the same for every ﬁnding category. The split of data in the same category are totally random. The average score on ﬁve folds are reported.
Hardware Specification	No	The paper does not mention any specific GPU, CPU, or cloud hardware (e.g., NVIDIA V100, Intel Xeon, AWS instance types) used for running experiments.
Software Dependencies	No	The paper mentions using Dense Net-121 and LSTM units but does not specify software versions for libraries like PyTorch, TensorFlow, or CUDA, which are necessary for full reproducibility.
Experiment Setup	Yes	Input image size is 512 512, and the feature map from Dense Net-121 block 4 is 1024 16 16. We randomly crop a 512 512 region with padding if needed, and no other data augmentation is used for all experiments. We included 20 ﬁnding keywords as disease categories, which is more complete than the previous works. We tokenize all the words in the reports and drop infrequent tokens with frequency less than three. wpos is set to 0.8 and wattr is set to 0.2.