reproducibilityindex.ai

Transferring Labels to Solve Annotation Mismatches Across Object Detection Datasets

Authors: Yuan-Hong Liao, David Acuna, Rafid Mahmood, James Lucas, Viraj Uday Prabhu, Sanja Fidler

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Validating across four object detection scenarios, deﬁned over seven different datasets and three different architectures, we show that transferring labels for a target task via LGPL consistently improves the downstream detection in every setting, on average by 1.88 m AP and 2.65 AP75.
Researcher Affiliation	Collaboration	1NVIDIA, 2University of Toronto, Vector Institute, 3University of Ottawa, 4Georgia Institute of Technology
Pseudocode	No	The paper provides algorithmic descriptions and diagrams but does not include formal pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	Project website will be at: https://andrewliao11.github.io/label-transfer
Open Datasets	Yes	We create four scenarios from ﬁve real-world datasets: Cityscapes (Cordts et al., 2016), Mapillary Vistas Dataset (MVD) (Neuhold et al., 2017), Waymo (Sun et al., 2020), nu Scenes, and nu Images (Caesar et al., 2020); and two synthetic datasets: Synscapes (Wrenninge & Unger, 2018) and Internal-Dataset, an internal dataset that we leave blinded for anonymity.
Dataset Splits	Yes	Cityscapes Cordts et al. (2016) contains 2975 training and 500 validation images. (...) For each experiment, we run on three random seeds, except for Deformable DETR due to its long training time.
Hardware Specification	Yes	All experiments are run on NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The paper mentions software components and frameworks like 'YOLOv3', 'Def-DETR', 'Faster-RCNN', 'Cascade-RCNN', 'HRNet-w32', 'PyTorch', but does not specify their version numbers.
Experiment Setup	Yes	We sweep the learning rate for each downstream detector with grid search, while other hyper-parameters remain unchanged from the origin papers. (...) For Faster-RCNN and Deformable DETR, we resize the image to (1800, 900) and randomly ﬂip the image horizontally in the Synscapes ! Cityscapes scenario, and resize the image to (1600, 900) and randomly ﬂip the image horizontally in the other three scenarios. For YOLOv3, we resize the image to (1500, 800) and randomly ﬂip the image horizontally in all scenarios. (...) All data-driven label transfer models (PL, PL & NF, and LGPL) adopt Cascade-RCNN Cai & Vasconcelos (2018) and use Image Net-pretrained HRNet-w32 Sun et al. (2019) as the image backbone with batch size 16. (...) We sweep the learning rate with the values 0.01, 0.02, 0.03, 0.04 and with the values 0.01, 0.02, 0.03.