Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Authors: Aakash Kaku, Sahana Upadhya, Narges Razavian

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Both loss objectives either outperform standard Mo Co, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved Mo Co are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard Mo Co and learn informative features earlier in the network.
Researcher Affiliation Academia Aakash Kaku Center for Data Science New York University New York, NY 10011 ark576@nyu.edu Sahana Upadhya Department of Computer Science Courant Institute of Mathematical Sciences New York, NY 10011 su575@nyu.edu Narges Razavian Departments of Population Health and Radiology NYU Grossman School of Medicine and NYU Center for Data Science New York, NY 10016 Narges.Razavian@nyulangone.org
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code Yes The code is available at https://github.com/aakashrkaku/ intermdiate_layer_matter_ssl.
Open Datasets Yes We test our method on three diverse medical datasets:NIH chest x-ray, diabetic retinopathy and breast cancer histopathology. NIH Chest X-ray dataset [Wang et al., 2017]... The Eye PACS diabetic retinopathy [Voets et al., 2019b, Kaggle, 2015] dataset... The dataset [Janowczyk and Madabhushi, 2016, Cruz-Roa et al., 2014] consists of...
Dataset Splits Yes NIH Chest X-ray dataset [Wang et al., 2017]... The data is split, by patient, into training (70%), validation (10%) and test (20%) sets. ...Diabetic retinopathy (DR) dataset The Eye PACS diabetic retinopathy [Voets et al., 2019b, Kaggle, 2015] dataset consists of 88,702 retinal fundus images. The training and validation set consists of 57,146 images with a 20% split for validation set, and the test set consists of 8790 images. ...Breast cancer histopathology dataset... The data is split based on the whole slide images, into training (70%), validation (10%) and test (20%) sets.
Hardware Specification Yes We ran all our experiments on Nvidia Tesla V100 GPU with a memory of 32 GB.
Software Dependencies No The paper mentions using 'pytorch-lightning' but does not provide specific version numbers for it or any other key software dependencies.
Experiment Setup Yes We use Resnet-50 as the backbone architecture for all the baseline self-supervised methods, supervised models and our approach. ...The scaling factor multiplies the intermediate feature loss by 0.25 (in case of MSE loss) or by 5e-5 (in case of BT loss). ...Other hyperparameters such as learning rate, embedding dimension, number of negative pairs, temperature scaling, encoder momentum, weight decay and sgd momentum are kept same for all the models. The value for the above hyper-parameters are as follows: learning rate = 0.3, embedding dimension = 128, number of negative pairs = 65536, temperature scaling = 0.07, encoder momentum = 0.99, weight decay = 1e-4 and sgd momentum = 0.9. ...We trained the models by all self supervised methods until the learning saturates which happens around 50 epochs with a batch size of 16. ...trained all the self supervised models until the learning saturates which happens around 100 epochs with a batch size of 32. ...trained all the self supervised models until the learning saturates which happens around 75 epochs with a batch size of 32.