Proposal-Contrastive Pretraining for Object Detection from Fewer Data
Authors: Quentin Bouniot, Romaric Audigier, Angelique Loesch, Amaury Habrard
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS In this section, we present a comparative study of the results of our proposed method on standard and novel benchmarks for learning with fewer data, as well as an ablative study on the most relevant parts. First, we introduce the datasets, evaluation and training settings. |
| Researcher Affiliation | Academia | Universit e Paris-Saclay, CEA, LIST, F-91120, Palaiseau, France Universit e Jean Monnet Saint-Etienne, CNRS, Institut d Optique Graduate School, Laboratoire Hubert Curien UMR 5516, F-42023, Saint-Etienne, France Institut Universitaire de France (IUF) |
| Pseudocode | No | No explicitly labeled pseudocode or algorithm blocks were found in the paper. The methodology is described in text and through figures. |
| Open Source Code | No | The paper mentions that image IDs for reproducibility will be made available, but does not provide a statement or link for the open-sourcing of the code for the described methodology. |
| Open Datasets | Yes | We use Image Net ILSVRC 2012 (IN) (Russakovsky et al., 2015) for pretraining, MS-COCO (COCO) (Lin et al., 2014) and Pascal VOC 2007 and 2012 (Everingham et al., 2010) for finetuning. |
| Dataset Splits | Yes | To evaluate the performance in learning with fewer data, following previous work (Wei et al., 2021; Bar et al., 2022), we consider the Mini-COCO benchmarks, where we randomly sample 1%, 5% or 10% of the training data. Similarly, we also introduce the novel Mini-VOC benchmark, in which we randomly sample 5% or 10% of the training data. We also use the Few Shot Object Detection (FSOD) dataset (Fan et al., 2020) in the novel FSOD-test and FSOD-train benchmarks. We separate the FSOD test set with 80% of the data randomly sampled for training and the remaining 20% data for testing, by taking care of having at least 1 image for each class in both subsets, and do the same for the FSOD train set. In all experiments, we train the models with a batch size of 32 images over 8 A100 GPUs until the validation performance stops increasing... |
| Hardware Specification | Yes | a batch size of Nb = 64 images over 8 A100 GPUs |
| Software Dependencies | No | The paper mentions using specific models/frameworks like 'Def. DETR' and 'SCRL', but does not list any software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | The hyperparameters are set as follows: the EMA keep rate parameter to 0.999, the Io U threshold δ = 0.5, a batch size of Nb = 64 images over 8 A100 GPUs, and the coefficients in the different losses λsim = λcontrast = 2 which is the same value used for the coefficient governing the class cross-entropy in the supervised loss. The projector is defined as a 2-layer MLP with a hidden layer of 4096 and a last layer of 256, without batch normalization. Following SCE (Denize et al., 2023), we set the temperatures τ = 0.1, τt = 0.07 and the coefficient λSCE = 0.5. We sample K = 30 random boxes from the outputs of Selective Search for each image at every iteration. Other training and architecture hyperparameters are defined as in Def. DETR (Zhu ets al., 2021) with, specifically, the coefficients λcoord = 5 and λgiou = 2, the number of object proposals (queries) N = 300, and the learning rate is set to lr = 2 10 4. |