Image-aware Evaluation of Generated Medical Reports

Authors: Gefen Dawidowicz, Elad Hirsch, Ayellet Tal

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze our proposed metric by comparing its alignment with radiologists judgment and by using a controlled dataset to demonstrate the domain sensitivity of the various metrics across different error types. Further details on the experiments and findings are provided below.
Researcher Affiliation Academia Gefen Dawidowicz Elad Hirsch Ayellet Tal Technion Israel Institute of Technology
Pseudocode No The paper provides mathematical formulas for T(i, g, r) and VLScore but does not include structured pseudocode or algorithm blocks.
Open Source Code No The code for computing our metric and the perturbed dataset will be released.
Open Datasets Yes The recent work of [34] introduces an evaluation dataset, called Radiology Report Expert Evaluation (Re XVal), comprising 200 pairs of reports. Each pair consists of a reference report (drawn from the MIMIC-CXR training set) and a candidate report retrieved as the best match from the same dataset, according to one of the following automated metrics: BLEU, BERTScore, Che Xbert, and Rad Graph F1. Each pair was annotated by 6 board-certified radiologists, who provide the number of errors in the candidate report across several error categories, such as omission of finding or false location of finding.
Dataset Splits Yes Toward this end, we sampled a subset of 440 pairs of images and reports from the validation and test sets of MIMIC-CXR.
Hardware Specification No The paper states in its NeurIPS Paper Checklist that hardware specifications are "Addressed in the supplementary materials," meaning they are not provided in the main text.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup No The paper states in its NeurIPS Paper Checklist that experimental details appear in Section 4 and in the supplementary materials. However, Section 4 details evaluation methods and data but does not explicitly provide specific hyperparameter values or training configurations for the experimental setup within the main text.