Transferring Labels to Solve Annotation Mismatches Across Object Detection Datasets

Authors: Yuan-Hong Liao, David Acuna, Rafid Mahmood, James Lucas, Viraj Uday Prabhu, Sanja Fidler

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Validating across four object detection scenarios, defined over seven different datasets and three different architectures, we show that transferring labels for a target task via LGPL consistently improves the downstream detection in every setting, on average by 1.88 m AP and 2.65 AP75.
Researcher Affiliation Collaboration 1NVIDIA, 2University of Toronto, Vector Institute, 3University of Ottawa, 4Georgia Institute of Technology
Pseudocode No The paper provides algorithmic descriptions and diagrams but does not include formal pseudocode or a clearly labeled algorithm block.
Open Source Code Yes Project website will be at: https://andrewliao11.github.io/label-transfer
Open Datasets Yes We create four scenarios from five real-world datasets: Cityscapes (Cordts et al., 2016), Mapillary Vistas Dataset (MVD) (Neuhold et al., 2017), Waymo (Sun et al., 2020), nu Scenes, and nu Images (Caesar et al., 2020); and two synthetic datasets: Synscapes (Wrenninge & Unger, 2018) and Internal-Dataset, an internal dataset that we leave blinded for anonymity.
Dataset Splits Yes Cityscapes Cordts et al. (2016) contains 2975 training and 500 validation images. (...) For each experiment, we run on three random seeds, except for Deformable DETR due to its long training time.
Hardware Specification Yes All experiments are run on NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions software components and frameworks like 'YOLOv3', 'Def-DETR', 'Faster-RCNN', 'Cascade-RCNN', 'HRNet-w32', 'PyTorch', but does not specify their version numbers.
Experiment Setup Yes We sweep the learning rate for each downstream detector with grid search, while other hyper-parameters remain unchanged from the origin papers. (...) For Faster-RCNN and Deformable DETR, we resize the image to (1800, 900) and randomly flip the image horizontally in the Synscapes ! Cityscapes scenario, and resize the image to (1600, 900) and randomly flip the image horizontally in the other three scenarios. For YOLOv3, we resize the image to (1500, 800) and randomly flip the image horizontally in all scenarios. (...) All data-driven label transfer models (PL, PL & NF, and LGPL) adopt Cascade-RCNN Cai & Vasconcelos (2018) and use Image Net-pretrained HRNet-w32 Sun et al. (2019) as the image backbone with batch size 16. (...) We sweep the learning rate with the values 0.01, 0.02, 0.03, 0.04 and with the values 0.01, 0.02, 0.03.