Recognize Any Regions

Authors: Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in open-world object recognition show that our Region Spot achieves significant performance gain over prior alternatives
Researcher Affiliation Collaboration Haosen Yang1 Chuofan Ma2 Bin Wen3 Yi Jiang3 Zehuan Yuan3 Xiatian Zhu1 1University of Surrey 2The University of Hong Kong 3Byte Dance
Pseudocode No The paper describes the model architecture and process flow but does not include an explicit pseudocode block or algorithm.
Open Source Code No The code will be available after being accepted.
Open Datasets Yes For training, we utilized publicly available detection datasets, comprising a total of approximately 3 million images. These datasets include Objects 365 (O365) [29], Open Images (OI) [15], and V3Det (V3D) [33]
Dataset Splits Yes We utilized the extensive LVIS detection dataset [8], which encompasses 1203 categories and 19809 images reserved for validation.
Hardware Specification Yes training our model with 3 million data in a single day using 8 V100 GPUs.
Software Dependencies No The paper mentions the use of Adam W optimizer, but does not specify programming language versions or specific library versions like PyTorch or TensorFlow.
Experiment Setup Yes We train Region Spot using Adam W [13] optimizer with the initial learning rate as 2.5 10 5. All models are trained with a mini-batch size 16 on 8 GPUs. The default training schedule is 450K iterations, with the learning rate divided by 10 at 350K and 420K iterations. The model is trained for 450K iterations at each stage. ... using input image resolution of 336.