reproducibilityindex.ai

Disentangling Human Error from Ground Truth in Segmentation of Medical Images

Authors: Le Zhang, Ryutaro Tanno, Mou-Cheng Xu, Chen Jin, Joseph Jacob, Olga Cicarrelli, Frederik Barkhof, Daniel Alexander

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For evaluation, we first simulate a diverse range of annotator types on the MNIST dataset by performing morphometric operations with Morpho-MNIST framework [19]. Then we demonstrate the potential in several real-world medical imaging datasets, namely (i) MS lesion segmentation dataset (MSLSC) from the ISBI 2015 challenge [20], (ii) Brain tumour segmentation dataset (Bra TS) [4] and (iii) Lung nodule segmentation dataset (LIDC-IDRI) [21]. Experiments on all datasets demonstrate that our method consistently leads to better segmentation performance compared to widely adopted label-fusion methods and other relevant baselines, especially when the number of available labels for each image is low and the degree of annotator disagreement is high.
Researcher Affiliation	Collaboration	1Queen Square Multiple Sclerosis Centre, Department of Neuroinﬂammation, Queen Square Institute of Neurology, Faculty of Brain Sciences, University College London, London, UK. 2Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK. 3 Healthcare Intelligence, Microsoft Research, Cambridge, UK
Pseudocode	No	The paper describes the model and learning process using text and mathematical equations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our code is available at https: //github.com/moucheng2017/Learn_Noisy_Labels_Medical_Images.
Open Datasets	Yes	For evaluation, we first simulate a diverse range of annotator types on the MNIST dataset by performing morphometric operations with Morpho-MNIST framework [19]. Then we demonstrate the potential in several real-world medical imaging datasets, namely (i) MS lesion segmentation dataset (MSLSC) from the ISBI 2015 challenge [20], (ii) Brain tumour segmentation dataset (Bra TS) [4] and (iii) Lung nodule segmentation dataset (LIDC-IDRI) [21].
Dataset Splits	Yes	MNIST dataset consists of 60,000 training and 10,000 testing examples, all of which are 28 28 grayscale images of digits from 0 to 9, and we derive the segmentation labels by thresholding the intensity values at 0.5. The MS dataset is publicly available and comprises 21 3D scans from 5 subjects. All scans are split into 10 for training and 11 for testing. We hold out 20% of training images as a validation set for both datasets.We extract each slice as 2D images and split them at case-wise to have, 1600 images for training, 300 for validation and 500 for testing.We split the dataset at case-wise into a training (722 patients), validation (144 patients) and testing (144 patients). We then resampled the CT scans to 1mm 1mm in-plane resolution...We hold 5000 images in the training set, 1000 images in the validation set and 1000 images in the test set.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers for its implementation, such as Python versions or library versions (e.g., PyTorch, TensorFlow, scikit-learn).
Experiment Setup	No	The paper mentions pre-processing steps like centre cropping and normalisation, and that parameters are learned via stochastic gradient descent. It also discusses a regularisation coefficient λ that is varied. However, it does not explicitly provide concrete hyperparameter values such as learning rate, batch size, or number of epochs.