reproducibilityindex.ai

Image-aware Evaluation of Generated Medical Reports

Authors: Gefen Dawidowicz, Elad Hirsch, Ayellet Tal

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyze our proposed metric by comparing its alignment with radiologists judgment and by using a controlled dataset to demonstrate the domain sensitivity of the various metrics across different error types. Further details on the experiments and findings are provided below.
Researcher Affiliation	Academia	Gefen Dawidowicz Elad Hirsch Ayellet Tal Technion Israel Institute of Technology
Pseudocode	No	The paper provides mathematical formulas for T(i, g, r) and VLScore but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The code for computing our metric and the perturbed dataset will be released.
Open Datasets	Yes	The recent work of [34] introduces an evaluation dataset, called Radiology Report Expert Evaluation (Re XVal), comprising 200 pairs of reports. Each pair consists of a reference report (drawn from the MIMIC-CXR training set) and a candidate report retrieved as the best match from the same dataset, according to one of the following automated metrics: BLEU, BERTScore, Che Xbert, and Rad Graph F1. Each pair was annotated by 6 board-certified radiologists, who provide the number of errors in the candidate report across several error categories, such as omission of finding or false location of finding.
Dataset Splits	Yes	Toward this end, we sampled a subset of 440 pairs of images and reports from the validation and test sets of MIMIC-CXR.
Hardware Specification	No	The paper states in its NeurIPS Paper Checklist that hardware specifications are "Addressed in the supplementary materials," meaning they are not provided in the main text.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	No	The paper states in its NeurIPS Paper Checklist that experimental details appear in Section 4 and in the supplementary materials. However, Section 4 details evaluation methods and data but does not explicitly provide specific hyperparameter values or training configurations for the experimental setup within the main text.