Hard Negative Mixing for Contrastive Learning

Authors: Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, Diane Larlus

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We exhaustively ablate our approach on linear classification, object detection, and instance segmentation and show that employing our hard negative mixing procedure improves the quality of visual representations learned by a state-of-the-art self-supervised learning method. We learn representations on two datasets, the common Image Net-1K [35], and its smaller Image Net-100 subset, also used in [36, 38].
Researcher Affiliation Industry Yannis Kalantidis Mert Bulent Sariyildiz Noe Pion Philippe Weinzaepfel Diane Larlus NAVER LABS Europe Grenoble, France
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Project page: https://europe.naverlabs.com/mochi
Open Datasets Yes We learn representations on two datasets, the common Image Net-1K [35], and its smaller Image Net-100 subset, also used in [36, 38]. For object detection on PASCAL VOC [15] we follow [21] and fine-tune a Faster R-CNN [34], R50-C4 on trainval07+12 and test on test2007. In Table 3 we present results for object detection and semantic segmentation on the COCO dataset [28].
Dataset Splits Yes For linear classification on Image Net-100 (resp. Image Net-1K), we follow the common protocol and report results on the validation set.
Hardware Specification No The paper states, 'We run all experiments on 4 GPU servers.' but does not provide specific details about the GPU models, CPU, or memory used for the experiments.
Software Dependencies No The paper mentions using 'detectron2' and building on 'Mo Co-v2' but does not provide specific version numbers for these software components or any other libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes For linear classification on Image Net-100 (resp. Image Net-1K), we follow the common protocol and report results on the validation set. We report performance after learning linear classifiers for 60 (resp. 100) epochs, with an initial learning rate of 10.0 (30.0), a batch size of 128 (resp. 512) and a step learning rate schedule that drops at epochs 30, 40 and 50 (resp. 60, 80). For training we use K = 16k (resp. K = 65k). For Mo CHi, we also have a warm-up of 10 (resp. 15) epochs, i.e. for the first epochs we do not synthesize hard negatives.