Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning
Authors: Aakash Kaku, Sahana Upadhya, Narges Razavian
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both loss objectives either outperform standard Mo Co, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved Mo Co are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard Mo Co and learn informative features earlier in the network. |
| Researcher Affiliation | Academia | Aakash Kaku Center for Data Science New York University New York, NY 10011 ark576@nyu.edu Sahana Upadhya Department of Computer Science Courant Institute of Mathematical Sciences New York, NY 10011 su575@nyu.edu Narges Razavian Departments of Population Health and Radiology NYU Grossman School of Medicine and NYU Center for Data Science New York, NY 10016 Narges.Razavian@nyulangone.org |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | The code is available at https://github.com/aakashrkaku/ intermdiate_layer_matter_ssl. |
| Open Datasets | Yes | We test our method on three diverse medical datasets:NIH chest x-ray, diabetic retinopathy and breast cancer histopathology. NIH Chest X-ray dataset [Wang et al., 2017]... The Eye PACS diabetic retinopathy [Voets et al., 2019b, Kaggle, 2015] dataset... The dataset [Janowczyk and Madabhushi, 2016, Cruz-Roa et al., 2014] consists of... |
| Dataset Splits | Yes | NIH Chest X-ray dataset [Wang et al., 2017]... The data is split, by patient, into training (70%), validation (10%) and test (20%) sets. ...Diabetic retinopathy (DR) dataset The Eye PACS diabetic retinopathy [Voets et al., 2019b, Kaggle, 2015] dataset consists of 88,702 retinal fundus images. The training and validation set consists of 57,146 images with a 20% split for validation set, and the test set consists of 8790 images. ...Breast cancer histopathology dataset... The data is split based on the whole slide images, into training (70%), validation (10%) and test (20%) sets. |
| Hardware Specification | Yes | We ran all our experiments on Nvidia Tesla V100 GPU with a memory of 32 GB. |
| Software Dependencies | No | The paper mentions using 'pytorch-lightning' but does not provide specific version numbers for it or any other key software dependencies. |
| Experiment Setup | Yes | We use Resnet-50 as the backbone architecture for all the baseline self-supervised methods, supervised models and our approach. ...The scaling factor multiplies the intermediate feature loss by 0.25 (in case of MSE loss) or by 5e-5 (in case of BT loss). ...Other hyperparameters such as learning rate, embedding dimension, number of negative pairs, temperature scaling, encoder momentum, weight decay and sgd momentum are kept same for all the models. The value for the above hyper-parameters are as follows: learning rate = 0.3, embedding dimension = 128, number of negative pairs = 65536, temperature scaling = 0.07, encoder momentum = 0.99, weight decay = 1e-4 and sgd momentum = 0.9. ...We trained the models by all self supervised methods until the learning saturates which happens around 50 epochs with a batch size of 16. ...trained all the self supervised models until the learning saturates which happens around 100 epochs with a batch size of 32. ...trained all the self supervised models until the learning saturates which happens around 75 epochs with a batch size of 32. |