Proposal-Contrastive Pretraining for Object Detection from Fewer Data

Authors: Quentin Bouniot, Romaric Audigier, Angelique Loesch, Amaury Habrard

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS In this section, we present a comparative study of the results of our proposed method on standard and novel benchmarks for learning with fewer data, as well as an ablative study on the most relevant parts. First, we introduce the datasets, evaluation and training settings.
Researcher Affiliation Academia Universit e Paris-Saclay, CEA, LIST, F-91120, Palaiseau, France Universit e Jean Monnet Saint-Etienne, CNRS, Institut d Optique Graduate School, Laboratoire Hubert Curien UMR 5516, F-42023, Saint-Etienne, France Institut Universitaire de France (IUF)
Pseudocode No No explicitly labeled pseudocode or algorithm blocks were found in the paper. The methodology is described in text and through figures.
Open Source Code No The paper mentions that image IDs for reproducibility will be made available, but does not provide a statement or link for the open-sourcing of the code for the described methodology.
Open Datasets Yes We use Image Net ILSVRC 2012 (IN) (Russakovsky et al., 2015) for pretraining, MS-COCO (COCO) (Lin et al., 2014) and Pascal VOC 2007 and 2012 (Everingham et al., 2010) for finetuning.
Dataset Splits Yes To evaluate the performance in learning with fewer data, following previous work (Wei et al., 2021; Bar et al., 2022), we consider the Mini-COCO benchmarks, where we randomly sample 1%, 5% or 10% of the training data. Similarly, we also introduce the novel Mini-VOC benchmark, in which we randomly sample 5% or 10% of the training data. We also use the Few Shot Object Detection (FSOD) dataset (Fan et al., 2020) in the novel FSOD-test and FSOD-train benchmarks. We separate the FSOD test set with 80% of the data randomly sampled for training and the remaining 20% data for testing, by taking care of having at least 1 image for each class in both subsets, and do the same for the FSOD train set. In all experiments, we train the models with a batch size of 32 images over 8 A100 GPUs until the validation performance stops increasing...
Hardware Specification Yes a batch size of Nb = 64 images over 8 A100 GPUs
Software Dependencies No The paper mentions using specific models/frameworks like 'Def. DETR' and 'SCRL', but does not list any software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes The hyperparameters are set as follows: the EMA keep rate parameter to 0.999, the Io U threshold δ = 0.5, a batch size of Nb = 64 images over 8 A100 GPUs, and the coefficients in the different losses λsim = λcontrast = 2 which is the same value used for the coefficient governing the class cross-entropy in the supervised loss. The projector is defined as a 2-layer MLP with a hidden layer of 4096 and a last layer of 256, without batch normalization. Following SCE (Denize et al., 2023), we set the temperatures τ = 0.1, τt = 0.07 and the coefficient λSCE = 0.5. We sample K = 30 random boxes from the outputs of Selective Search for each image at every iteration. Other training and architecture hyperparameters are defined as in Def. DETR (Zhu ets al., 2021) with, specifically, the coefficients λcoord = 5 and λgiou = 2, the number of object proposals (queries) N = 300, and the learning rate is set to lr = 2 10 4.