FP-DETR: Detection Transformer Advanced by Fully Pre-training
Authors: Wen Wang, Yang Cao, Jing Zhang, Dacheng Tao
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the challenging COCO dataset demonstrate that our FP-DETR achieves competitive performance. Moreover, it enjoys better robustness to common corruptions and generalization to small-size datasets than state-of-the-art detection transformers. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center 3The University of Sydney, 4JD Explore Academy, China |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code will be made publicly available at https://github.com/encounter1997/FP-DETR. |
| Open Datasets | Yes | Datasets. Following the common practice, our detector is pre-trained on Image Net (Deng et al., 2009) and fine-tuned on COCO 2017 (Lin et al., 2014) train set. ... Besides, we evaluated the model s generalization ability by fine-tuning on the small-size dataset, i.e., Cityscapes dataset (Cordts et al., 2016). |
| Dataset Splits | No | While the paper mentions fine-tuning on the 'COCO 2017 train set' and reporting 'Evaluation results on the val set of COCO 2017', it does not explicitly provide specific percentages, sample counts, or a detailed methodology for these dataset splits within the paper text itself. |
| Hardware Specification | Yes | All experiments are implemented on the NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions optimizers like 'Adam W' but does not specify software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | By default, our FP-DETR is pre-trained on Image Net-1k (Deng et al., 2009) for 300 epochs with Adam W (Loshchilov & Hutter, 2018) optimizer and cosine learning rate scheduler. Training strategies in Dei T (Touvron et al., 2021a) are adopted, and the image size is set as 224 224. We use a batch size of 1,024 for training, and the initial learning rate is set as 5 10 4. After pre-training, models are fine-tuned for 50 epochs with Adam W optimizer on the downstream tasks. The learning rate is initialized as 1 10 4 and decreased by a factor of 0.1 at the 40th epoch. ... Besides, we set both the number of sampling points and the feature levels in multi-scale deformable attention as 4, and the number of object query embeddings as 300. Models are fine-tuned with a batch size of 32. |