Image-aware Evaluation of Generated Medical Reports
Authors: Gefen Dawidowicz, Elad Hirsch, Ayellet Tal
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze our proposed metric by comparing its alignment with radiologists judgment and by using a controlled dataset to demonstrate the domain sensitivity of the various metrics across different error types. Further details on the experiments and findings are provided below. |
| Researcher Affiliation | Academia | Gefen Dawidowicz Elad Hirsch Ayellet Tal Technion Israel Institute of Technology |
| Pseudocode | No | The paper provides mathematical formulas for T(i, g, r) and VLScore but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The code for computing our metric and the perturbed dataset will be released. |
| Open Datasets | Yes | The recent work of [34] introduces an evaluation dataset, called Radiology Report Expert Evaluation (Re XVal), comprising 200 pairs of reports. Each pair consists of a reference report (drawn from the MIMIC-CXR training set) and a candidate report retrieved as the best match from the same dataset, according to one of the following automated metrics: BLEU, BERTScore, Che Xbert, and Rad Graph F1. Each pair was annotated by 6 board-certified radiologists, who provide the number of errors in the candidate report across several error categories, such as omission of finding or false location of finding. |
| Dataset Splits | Yes | Toward this end, we sampled a subset of 440 pairs of images and reports from the validation and test sets of MIMIC-CXR. |
| Hardware Specification | No | The paper states in its NeurIPS Paper Checklist that hardware specifications are "Addressed in the supplementary materials," meaning they are not provided in the main text. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | No | The paper states in its NeurIPS Paper Checklist that experimental details appear in Section 4 and in the supplementary materials. However, Section 4 details evaluation methods and data but does not explicitly provide specific hyperparameter values or training configurations for the experimental setup within the main text. |