Object-aware Contrastive Learning for Debiased Scene Representation

Authors: Sangwoo Mo, Hyunwoo Kang, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate the effectiveness of our representation learning framework, particularly when trained under multi-object images or evaluated under the background (and distribution) shifted images.3 Experiments We first verify the localization performance of Contra CAM in Section 3.1. We then demonstrate the efficacy of our debiased contrastive learning: object-aware random crop improves the training under multi-object images by reducing contextual bias in Section 3.2, and background mixup improves generalization on background and distribution shifts by reducing background bias in Section 3.3.
Researcher Affiliation Collaboration Sangwoo Mo 1, Hyunwoo Kang 1, Kihyuk Sohn2, Chun-Liang Li2, Jinwoo Shin1 1KAIST 2Google Cloud AI {swmo,hyunwookang,jinwoos}@kaist.ac.kr, {kihyuks,chunliang}@google.com
Pseudocode Yes We provide the pseudo-code of the entire Iterative Contra CAM procedure in Appendix A.
Open Source Code Yes Code is available at https://github.com/alinlab/object-aware-contrastive.
Open Datasets Yes We train the models for 800 epochs on COCO [25] and Image Net-9 [23], and 2,000 epochs on CUB [46] and Flowers [26] datasets with batch size 256.
Dataset Splits No The paper mentions training and testing on datasets like COCO, Flowers, CUB, Image Net-9, CIFAR-10, CIFAR-100, Food, and Pets, and evaluating via linear evaluation. However, explicit training/validation/test splits (percentages, sample counts, or references to predefined splits for all datasets used) are not provided in the main text. For example, for linear evaluation, it mentions training a linear classifier "on top of the learned representation" using the ORIGINAL dataset for Background Challenge, but not specific splits like 80/10/10.
Hardware Specification Yes The training of the baseline models on the COCO ( 100,000 samples) dataset takes 1.5 days on 4 GPUs and 3 days on 8 GPUs for Res Net-18 and Res Net-50 architectures, respectively, using a single machine with 8 Ge Force RTX 2080 Ti GPUs; proportional to the number of samples and training epochs for other cases.
Software Dependencies No We apply the conditional random field (CRF) using the default hyperparameters from the pydensecrf library [49] to produce segmentation masks and use the opencv [50] library to extract bounding boxes.
Experiment Setup Yes We train the models for 800 epochs on COCO [25] and Image Net-9 [23], and 2,000 epochs on CUB [46] and Flowers [26] datasets with batch size 256.We follow the default hyperparameters of Mo Cov2 and BYOL, except the smaller minimum random crop scale of 0.08 (instead of the original 0.2) since it performed better, especially for the multi-object images.