Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Authors: Chong Ma, Hanqi Jiang, Wenting Chen, Yiwei Li, Zihao Wu, Xiaowei Yu, Zhengliang Liu, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, Xiang Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets.
Researcher Affiliation Collaboration Chong Ma1, Hanqi Jiang2, Wenting Chen3, Yiwei Li2, Zihao Wu2 Xiaowei Yu4, Zhengliang Liu2, Lei Guo1, Dajiang Zhu4, Tuo Zhang1 Dinggang Shen5, Tianming Liu2, Xiang Li6 1Northwest Polytechnical University 2University of Georgia 3City University of Hong Kong 4University of Texas at Arlington 5Shanghai Tech University & Shanghai United Imaging Intelligence Co. 6Massachusetts General Hospital, Harvard University
Pseudocode No The paper describes algorithms using mathematical formulations and descriptive text, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes The code of this work is available on Github2. 2https://github.com/MoMarky/EGMA
Open Datasets Yes In this work, we utilize MIMIC-EYE [14] datasets as our training set, consisting of 3689 images extracted from the MIMIC datasets [19, 20, 18, 17]. Each sample is accompanied by corresponding eye-tracking data and transcripts text. These eye-tracking data are provided by the publicly available EYE GAZE [22] and REFLACX [31] datasets on Physio Net [12].
Dataset Splits Yes RSNA [44] is a comprehensive dataset for Pneumonia diagnosing. It contains 29,700 chest X-ray images categorized into normal and pneumonia positive category. We follow [53] to divide the data into 70% for training, 15% for validation, and 15% for testing.
Hardware Specification Yes And all our training tasks are completed on four RTX 3090 GPUs.
Software Dependencies No The paper mentions specific software components like "Swin Transformer" and "Bio Clinical BERT" as image and text encoders, but does not provide version numbers for general software dependencies (e.g., Python, PyTorch, CUDA) required to replicate the experiment.
Experiment Setup Yes In the pre-training process... we train our model with 50 epochs with an initial learning rate 1 × 10−6 and weight decay 1 × 10−4 and 10 epochs of warm-up. In the supervised classification experiments... we fine-tune our model with 30 epochs with an initial learning rate 5 × 10−7 and weight decay 1 × 10−4 and 6 epochs of warm-up.