Automatic Radiology Reports Generation via Memory Alignment Network
Authors: Hongyu Shen, Mingtao Pei, Juncai Liu, Zhaoxing Tian
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The comparison experiments with other alignment methods show that the proposed alignment method is less costly and more effective. The proposed approach achieves better performance than state-of-the-art approaches on two public datasets IU X-Ray and MIMIC-CXR, which further demonstrates the effectiveness of the proposed alignment method. |
| Researcher Affiliation | Academia | Hongyu Shen1, Mingtao Pei1, Juncai Liu2, Zhaoxing Tian3 1 Beijing Institute of Technology 2 Shandong University of Science and Technology 3 Beijing Jishuitan Hospital |
| Pseudocode | No | The paper describes the model architecture and processes using text and mathematical formulas, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, such as a repository link or an explicit statement of code release. |
| Open Datasets | Yes | We evaluate our method on two public reports generation datasets, MIMIC-CXR and IU X-Ray. The MIMIC-CXR is currently the largest dataset for the radiograph reports generation task, which contains 337,110 chest X-ray images and 227,835 reports. For a fair comparison with previous methods, we use the official splitting for training, validation and testing. The IU X-Ray consists of 7,470 frontal and lateral-view chest X-ray images and 3,955 corresponding reports. Following the common approach, we randomly split it into training, validation, and testing sets with the ratio of 7:1:2. |
| Dataset Splits | Yes | For a fair comparison with previous methods, we use the official splitting for training, validation and testing. The IU X-Ray consists of 7,470 frontal and lateral-view chest X-ray images and 3,955 corresponding reports. Following the common approach, we randomly split it into training, validation, and testing sets with the ratio of 7:1:2. |
| Hardware Specification | Yes | All experiments are run on the Nvidia Geforce 3090 GPUs. |
| Software Dependencies | No | The paper mentions using DenseNet121, ImageNet, BERT-base model, and AdamW optimizer, but it does not specify any version numbers for these or other software libraries or frameworks. For example, it does not state versions for Python, PyTorch, TensorFlow, or specific BERT implementations. |
| Experiment Setup | Yes | For MIMIC-CXR, the maximum length of the report is set as 100 and for IU X-Ray it is 60. We adopt the Dense Net121 (Huang et al. 2017) pre-trained on the Image Net (Russakovsky et al. 2015) as the visual features extractor. The frequency threshold of the tokenizer is set to 3, obtain 7,861 and 764 tokens 2 from MIMIC-CXR and IU X-Ray, respectively. We employ the pre-trained uncased BERT-base model (Devlin et al. 2019) as the BERT encoders, which has 12 encoder layers, and the hidden dimension is 768. The dimension of memory vectors is the same as the hidden dimension. For MIMIC-CXR, number of vectors in the memory matrix η is set to 100, Heads Num is set to 4, and for IU X-Ray they are 64 and 2, respectively. During the inference stage, we adopt a beam search strategy with a beam size of 5 for sampling reports. We train our model under cross entropy loss, the learning rates of the visual extractor and other parameters are set to 5e 5 and 1e 4, respectively. The Adam W (Loshchilov and Hutter 2019) optimizer is adopted with a weight decay of 0.01. For MIMIC-CXR, the batch size is set as 48. And for IU X-Ray, the batch size is set as 24. |