Supervision Interpolation via LossMix: Generalizing Mixup for Object Detection and Beyond
Authors: Thanh Vu, Baochen Sun, Bodi Yuan, Alex Ngai, Yueqi Li, Jan-Michael Frahm
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on the PASCAL VOC and MS COCO datasets demonstrate that Loss Mix can consistently outperform state-of-the-art methods widely adopted for detection. |
| Researcher Affiliation | Collaboration | 1University of North Carolina at Chapel Hill 2Mineral {thanhvu,baochens,bodiyuan}@mineral.ai, {alexander.s.ngai,yueqili.innovation}@gmail.com, jmf@cs.unc.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper mentions leveraging the 'open-source PyTorch-based Detectron2 (Wu et al. 2019) repository as our object detection codebase' and 'leverage the open source code of the state-of-the-art Adaptive Teacher (Li et al. 2022b) framework', but does not provide a link or explicit statement about the availability of *their own* Loss Mix implementation code. |
| Open Datasets | Yes | We conduct experiments on two standard benchmark datasets in object detection, namely PASCAL VOC (Everingham et al. 2010) and MS COCO (Lin et al. 2014). We follow (Zhang et al. 2019) and use the combination of PASCAL VOC 2007 trainval (5k images) and 2012 trainval (12k images) for training. MS COCO (Lin et al. 2014) is composed of 80 object categories and is 10 times larger than PASCAL VOC. We conduct our experiments for cross-domain object detection using two popular and challenging real-to-artistic adaptation setups (Chen et al. 2020; Deng et al. 2021; Kim et al. 2019; Li et al. 2022b; Saito et al. 2019; Shen et al. 2019; Xu et al. 2020): PASCAL VOC (Everingham et al. 2010) Clipart1k (Inoue et al. 2018) and PASCAL VOC (Everingham et al. 2010) Watercolor2k (Inoue et al. 2018). |
| Dataset Splits | Yes | We follow (Zhang et al. 2019) and use the combination of PASCAL VOC 2007 trainval (5k images) and 2012 trainval (12k images) for training. Together they make up 16,551 images of 20 categories of common, real-world objects, each with fully annotated bounding boxes and class labels. The evaluation is done on PASCAL VOC 2007 test set (5K images). MS COCO (Lin et al. 2014) is composed of 80 object categories and is 10 times larger than PASCAL VOC. We train on train2017 (118K images) and evaluated on val2017 (5K images). Clipart1k dataset shares the same set of object categories as PASCAL VOC and contains a total of 1000 images. We split these into 500 training and 500 test examples. Watercolor2k dataset, which has 2000 images from 6 classes in common with the PASCAL VOC, are split into 1000 training and 1000 test images. |
| Hardware Specification | Yes | All experiments were trained with 8 NVIDIA GPUs, either V100 or A100. |
| Software Dependencies | No | The paper mentions using 'PyTorch-based Detectron2' but does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | Unless otherwise speciļ¬ed, we use a batch size of 64 for faster convergence, an initial learning rate of 0.08, and the default step scheduler from Detectron2. We train PASCAL VOC for 18K iterations, which is about 70 epochs, and MS COCO for 270K iterations, or roughly 146.4 epochs. |