CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Authors: Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Matthew P. Lungren, Andrew Y. Ng590-597

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies.
Researcher Affiliation Academia 1Department of Computer Science, Stanford University 2Department of Medicine, Stanford University 3Department of Radiology, Stanford University
Pseudocode No No pseudocode or clearly labeled algorithm blocks are present in the paper.
Open Source Code No The paper states: 'We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.' and provides a link to the dataset/competition page (https://stanfordmlgroup.github.io/competitions/chexpert), but does not explicitly state that the source code for their methodology is open-sourced or provide a direct link to it.
Open Datasets Yes We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.1
Dataset Splits Yes The validation set contains 200 studies from 200 patients randomly sampled from the full dataset with no patient overlap with the report evaluation set. The test set consists of 500 studies from 500 patients randomly sampled from the 1000 studies in the report test set. Table 1: The Che Xpert dataset consists of 14 labeled observations. We report the number of studies which contain these observations in the training set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like NLTK, Bllip parser, and Stanford Core NLP, but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes Images are fed into the network with size 320 320 pixels. We use the Adam optimizer with default β-parameters of β1 = 0.9, β2 = 0.999 and learning rate 1 10 4 which is fixed for the duration of the training. Batches are sampled using a fixed batch size of 16 images. We train for 3 epochs, saving checkpoints every 4800 iterations.