reproducibilityindex.ai

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

Authors: Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments to verify this capability and show that the m AP evaluator based on Bo S score yields very competitive results.
Researcher Affiliation	Collaboration	1The Australian National University 2Open GVLab, Shanghai AI Laboratory 3The Chinese University of Hong Kong 4Nanjing University 5Tsinghua University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and data are available at https://github.com/Yang Yang Girl/Bo S.
Open Datasets	Yes	For vehicle detection, we use 10 datasets, including COCO (Lin et al., 2014), BDD (Yu et al., 2020), Cityscapes (Cordts et al., 2015), DETRAC (Wen et al., 2020), Exdark (Loh & Chan, 2019) Kitti (Geiger et al., 2013), Self-driving (Kaggle, 2020a), Roboflow (Kaggle, 2022), Udacity (Kaggle, 2021), Traffic (Kaggle, 2020b).
Dataset Splits	Yes	We employ the leave-one-out evaluation procedure, where each time we use 450 labeled sample sets {Dl 1, Dl 2, ...} generated from 9 out of 10 sources as the training meta-set.
Hardware Specification	Yes	We conduct detector training experiments on four A100, and detector testing experiments on one A100, with Py Torch 1.9.0 and CUDA 11.1. The CPU is Intel(R) Xeon(R) Gold 6248R 29Core Processor.
Software Dependencies	Yes	Py Torch 1.9.0 and CUDA 11.1.
Experiment Setup	Yes	We follow common practices (Liu et al., 2021) to initialize the backbone with pre-trained classification weights, and train models using a 3 (36 epochs) schedule by default. The models are typically trained with stochastic gradient descent (SGD) optimizer with a learning rate of 10 3. Weight decay of 10 4 and momentum of 0.9 are used. We use synchronized SGD over 4 GPUs with a total of 8 images per minibatch (2 images per GPU). During the training process, we resize the input images such that the shortest side is at most 800 pixels while the longest is at most 1333 (Chen et al., 2019). We use horizontal image flipping as the only form of data augmentation. During the testing process, we resize the input images to a fixed size of 800 1333.