reproducibilityindex.ai

Advancing Radiograph Representation Learning with Masked Record Modeling

Authors: Hong-Yu Zhou, Chenyu Lian, Liansheng Wang, Yizhou Yu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we mainly compare MRM against reportand self-supervised R2L methodologies on 5 well-established public datasets. Average results are reported over three training runs. Specifically, we find that MRM offers superior performance in label-efficient fine-tuning. For instance, MRM achieves 88.5% mean AUC on Che Xpert using 1% labeled data, outperforming previous R2L methods with 100% labels.
Researcher Affiliation	Collaboration	Hong-Yu Zhou1,2 Chenyu Lian1 Liansheng Wang1 Yizhou Yu2,3 1School of Informatics, Xiamen University 2Department of Computer Science, The University of Hong Kong 3AI Lab, Deepwise Healthcare whuzhouhongyu@gmail.com, cylian@stu.xmu.edu.cn, lswang@xmu.edu.cn, yizhouy@acm.org
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and models are available at https://github.com/RL4M/ MRM-pytorch.
Open Datasets	Yes	We conduct pre-training on MIMIC-CXR (Johnson et al., 2019), one of the largest X-ray datasets, that contains more than 370,000 radiograph images from over 220,000 patient studies. ... We evaluate the pre-trained model on 4 X-ray datasets in the classification tasks, which are NIH Chest X-ray (Wang et al., 2017), Che Xpert (Irvin et al., 2019), RSNA Pneumonia (Shih et al., 2019), and COVID-19 Image Data Collection (Cohen et al., 2020). For the segmentation task, we fine-tune the pre-trained model on SIIM-ACR Pneumothorax Segmentation.3
Dataset Splits	Yes	Che Xpert introduces a multi-label classification problem on chest X-rays. ... The training/validation/test split each constitutes 218,414/5,000/234 images of the whole dataset. ... We adopt the official data split, where the training/validation/test set comprises 25,184/1,500/3,000 images, respectively. ... The training/validation/test split each constitutes 70%/10%/20% of the whole dataset. ... where the training/validation/test set comprises 356/54/99 radiographs, respectively. ... where the training/validation/test set contains 297/43/86 cases, respectively. ... We follow Huang et al. (2021) to construct the training/validation/test split, where each constitutes 70%/15%/15% of the whole dataset.
Hardware Specification	Yes	The pre-training experiments were conducted on 4 Ge Force RTX 3080Ti GPUs, and the training time is about 2 days for 200 epochs, requiring 12GB memory from each GPU. ... For fine-tuning on SIIM, we train the segmentation network on 4 Ge Force RTX 3080Ti GPUs. For fine-tuning on other datasets, we train the classification network on a single Ge Force RTX 3080Ti GPU
Software Dependencies	Yes	Our code is implemented using Py Torch 1.8.2 (Paszke et al., 2019).
Experiment Setup	Yes	Our code is implemented using Py Torch 1.8.2 (Paszke et al., 2019). The pre-training experiments were conducted on 4 Ge Force RTX 3080Ti GPUs, and the training time is about 2 days for 200 epochs, requiring 12GB memory from each GPU. The training batch size is 256. We use Adam W (Loshchilov & Hutter, 2017) as the default optimizer, where the initial learning rate is 1.5e 4, weight decay is 0.05, β1 is 0.9, and β2 is 0.95. The MSE and cross-entropy losses are used for masked image and language modeling, respectively. In practice, we set λ in Eq. 3 to 1.