Transferring Labels to Solve Annotation Mismatches Across Object Detection Datasets
Authors: Yuan-Hong Liao, David Acuna, Rafid Mahmood, James Lucas, Viraj Uday Prabhu, Sanja Fidler
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Validating across four object detection scenarios, defined over seven different datasets and three different architectures, we show that transferring labels for a target task via LGPL consistently improves the downstream detection in every setting, on average by 1.88 m AP and 2.65 AP75. |
| Researcher Affiliation | Collaboration | 1NVIDIA, 2University of Toronto, Vector Institute, 3University of Ottawa, 4Georgia Institute of Technology |
| Pseudocode | No | The paper provides algorithmic descriptions and diagrams but does not include formal pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Project website will be at: https://andrewliao11.github.io/label-transfer |
| Open Datasets | Yes | We create four scenarios from five real-world datasets: Cityscapes (Cordts et al., 2016), Mapillary Vistas Dataset (MVD) (Neuhold et al., 2017), Waymo (Sun et al., 2020), nu Scenes, and nu Images (Caesar et al., 2020); and two synthetic datasets: Synscapes (Wrenninge & Unger, 2018) and Internal-Dataset, an internal dataset that we leave blinded for anonymity. |
| Dataset Splits | Yes | Cityscapes Cordts et al. (2016) contains 2975 training and 500 validation images. (...) For each experiment, we run on three random seeds, except for Deformable DETR due to its long training time. |
| Hardware Specification | Yes | All experiments are run on NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions software components and frameworks like 'YOLOv3', 'Def-DETR', 'Faster-RCNN', 'Cascade-RCNN', 'HRNet-w32', 'PyTorch', but does not specify their version numbers. |
| Experiment Setup | Yes | We sweep the learning rate for each downstream detector with grid search, while other hyper-parameters remain unchanged from the origin papers. (...) For Faster-RCNN and Deformable DETR, we resize the image to (1800, 900) and randomly flip the image horizontally in the Synscapes ! Cityscapes scenario, and resize the image to (1600, 900) and randomly flip the image horizontally in the other three scenarios. For YOLOv3, we resize the image to (1500, 800) and randomly flip the image horizontally in all scenarios. (...) All data-driven label transfer models (PL, PL & NF, and LGPL) adopt Cascade-RCNN Cai & Vasconcelos (2018) and use Image Net-pretrained HRNet-w32 Sun et al. (2019) as the image backbone with batch size 16. (...) We sweep the learning rate with the values 0.01, 0.02, 0.03, 0.04 and with the values 0.01, 0.02, 0.03. |