Structured Semantic Transfer for Multi-Label Recognition with Partial Labels
Authors: Tianshui Chen, Tao Pu, Hefeng Wu, Yuan Xie, Liang Lin339-346
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms. |
| Researcher Affiliation | Academia | Tianshui Chen 1, Tao Pu 2, Hefeng Wu 2, Yuan Xie 2, Liang Lin 2 * 1 Guangdong University of Technology, 2 Sun Yat-Sen University |
| Pseudocode | No | The paper describes the algorithms in text and with equations but does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Codes are available at https://github.com/HCPLab-SYSU/HCP-MLR-PL. |
| Open Datasets | Yes | Datasets. We follow previous works (Durand, Mehrasa, and Mori 2019) to conduct experiments on the MS-COCO (Lin et al. 2014), Visual Genome (Krishna et al. 2016), and Pascal VOC 2007 (Everingham et al. 2010) datasets for evaluation. |
| Dataset Splits | Yes | MS-COCO contains about 120k images... further divided into a training set of about 80k images and a validation set of about 40k images. Visual Genome contains 108,249 images... We randomly select 10,000 images as the test set and the rest 98,249 images as the training set. Pascal VOC 2007... divided into a trainval set of about 5,011 images and a test set of 4,952 images. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used, such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions using a Res Net-101 backbone and the ADAM algorithm, but does not provide specific software dependency versions (e.g., PyTorch 1.x, TensorFlow 2.x, CUDA version). |
| Experiment Setup | Yes | The model is trained using the ADAM algorithm (Kingma and Ba 2015) with a batch size of 32, momentums of 0.999 and 0.9, and a weight decay of 5 * 10^-4. The original learning rate is set to 0.00001, and it is divided by 10 for every 10 epochs. It is trained with 20 epochs in total. During training, the input image is resized to 512 * 512, and we randomly choose a number from {512, 448, 384, 320, 256} as the width and height to crop patch. Finally, the cropped patch is further resized to 448 * 448. θintra and θinter... are set to 1 during the first 5 epochs... Then, they are set to 0.95 at epoch 6 and are decreased by 0.025 for every epoch until they reach the minimum θintra and θinter, respectively. Both the minimum θintra and θinter are set to 0.75 based on the experimental results. |