When Radiology Report Generation Meets Knowledge Graph
Authors: Yixiao Zhang, Xiaosong Wang, Ziyue Xu, Qihang Yu, Alan Yuille, Daguang Xu12910-12917
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones. We evaluate our work using the publicly accessible IU-RR dataset (Demner-Fushman et al. 2015). The performance of our model in both classification and report generation tasks is compared with previous arts in both quantitative and qualitative manner. In classification, our model performs better in most of the categories and achieves 2% Area Under Curve (AUC) improvement on average. In report generation, our model obtains better or equivalent performance in conventional evaluation metrics, and at the meantime scores significantly higher in the proposed MIRQI metrics. In this section, we report several experiments that explored and validated the advantage of including graph embedding module in radiology abnormality classification and report generation. |
| Researcher Affiliation | Collaboration | Yixiao Zhang,1 Xiaosong Wang,2 Ziyue Xu,2 Qihang Yu,1 Alan Yuille,1 Daguang Xu2 1Department of Computer Science, Johns Hopkins University, Baltimore, USA 2NVIDIA Corporation, Bethesda, USA |
| Pseudocode | No | The paper includes mathematical formulas and descriptions of the model, but no formal pseudocode block or algorithm. |
| Open Source Code | No | Importantly, we will make our code (both the model and metrics) and data split public available to promote a fair comparison for the future evaluation. |
| Open Datasets | Yes | We evaluate our work using the publicly accessible IU-RR dataset (Demner-Fushman et al. 2015). The dataset contains 3955 radiology reports, each associated with one frontal view chest x-ray image and optionally one lateral view image. |
| Dataset Splits | Yes | To evaluate our models, we employed stratified five-fold cross validation which ensures that the number of samples in each fold is roughly the same for every finding category. The split of data in the same category are totally random. The average score on five folds are reported. |
| Hardware Specification | No | The paper does not mention any specific GPU, CPU, or cloud hardware (e.g., NVIDIA V100, Intel Xeon, AWS instance types) used for running experiments. |
| Software Dependencies | No | The paper mentions using Dense Net-121 and LSTM units but does not specify software versions for libraries like PyTorch, TensorFlow, or CUDA, which are necessary for full reproducibility. |
| Experiment Setup | Yes | Input image size is 512 512, and the feature map from Dense Net-121 block 4 is 1024 16 16. We randomly crop a 512 512 region with padding if needed, and no other data augmentation is used for all experiments. We included 20 finding keywords as disease categories, which is more complete than the previous works. We tokenize all the words in the reports and drop infrequent tokens with frequency less than three. wpos is set to 0.8 and wattr is set to 0.2. |