Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels
Authors: Tao Pu, Tianshui Chen, Hefeng Wu, Liang Lin2091-2098
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors on all known label proportion settings, i.e., with the m AP improvement of 4.6%, 4.6%, 2.2% on these three datasets when the known label proportion is 10%. |
| Researcher Affiliation | Academia | Tao Pu 1, Tianshui Chen 2, Hefeng Wu 1, Liang Lin 1* 1 Sun Yat-Sen University, 2 Guangdong University of Technology putao3@mail2.sysu.edu.cn, tianshuichen@gmail.com, wuhefeng@gmail.com, linliang@ieee.org |
| Pseudocode | No | The paper describes the proposed modules and their operations textually and mathematically but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/HCPLab-SYSU/HCP-MLR-PL. |
| Open Datasets | Yes | We conduct experiments on the MS-COCO (Lin et al. 2014), Visual Genome (Krishna et al. 2016), and Pascal VOC 2007 (Everingham et al. 2010) datasets for fair comparison. |
| Dataset Splits | Yes | MS-COCO covers 80 daily-lift categories, which contains 82,801 images as the training set and 40,504 images as the validation set. Pascal VOC 2007 contains 9,963 images from 20 object categories, and we follow previous works to use the trainval set for training and the test set for evaluation. Visual Genome contains 108,249 images from 80,138 categories, and most categories have very few samples. In this work, we select the 200 most frequent categories to obtain a VG-200 subset. Moreover, since there is no train/val split, we randomly select 10,000 images as the test set and the rest 98,249 images are used as the training set. The train/test set will be released for further research. |
| Hardware Specification | No | No specific hardware details (such as GPU or CPU models) used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA) are provided in the paper. |
| Experiment Setup | Yes | During training, we use the Adam algorithm (Kingma and Ba 2015) with a batch size of 16, momentums of 0.999 and 0.9, and a weight decay of 5 10 4. We set the initial learning rate as 10 5 and divide it by 10 after every 10 epochs. It is trained with 20 epochs in total. For data augmentation, the input image is resized to 512 512, and we randomly choose a number from {512, 448, 384, 320, 256} as the width and height to crop patch. Finally, the cropped patch is further resized to 448 448. Besides, random horizontal flipping is also used. To stabilize the training process, we start to use the ILRB and PLRB modules at epoch 5, and re-compute prototypes of each category for every 5 epochs. |