Modular Graph Transformer Networks for Multi-Label Image Classification

Authors: Hoang D. Nguyen, Xuan-Son Vu, Duc-Trong Le9092-9100

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our framework on MS-COCO and Fashion550K datasets to demonstrate improvements for multilabel image classification. The experiment results show significant m AP improvements of 9.7% on MSCOCO and 6.4% on Fashion550K compared to the baselines. MGTN outperforms the most recent state-of-the-art models by the increment of 0.4% and 3.7% in m AP on MS-COCO and Fashion550K, respectively.
Researcher Affiliation Academia Hoang D. Nguyen 1, Xuan-Son Vu 2, Duc-Trong Le 3 1 School of Computing Science, University of Glasgow, Singapore 2 Department of Computing Science, Ume a University, Sweden 3 University of Engineering and Technology, Vietnam National University, Vietnam
Pseudocode No The paper provides mathematical equations (e.g., Eq. 1, 2, 3, 4, 5) to describe parts of the model and calculations, but it does not include a distinct pseudocode block or algorithm section.
Open Source Code Yes The source code is available at https://github.com/Re ML-AI/MGTN.
Open Datasets Yes MS-COCO (Lin et al. 2014) is the most popular multi-label image dataset. It has several main features: object segmentation, recognition in context, five captions per image among others. In total, it contains 2.5M labelled object instances in 328K images, in which 82,783 training, 40,504 validation, and 40,775 test images. Fashion550K (Inoue et al. 2017) is a multi-label fashion dataset. It contains 66 unique weakly-annotated tags with 407,772 images. ... This clean set has 3K, 300, 2K images for training, validation, and testing respectively.
Dataset Splits Yes MS-COCO (Lin et al. 2014) ... in which 82,783 training, 40,504 validation, and 40,775 test images. Fashion550K (Inoue et al. 2017) ... This clean set has 3K, 300, 2K images for training, validation, and testing respectively.
Hardware Specification Yes The experiments were run on two Nvidia Tesla V100, each card has 16GB memory.
Software Dependencies Yes Our proposed MGTN framework is developed using Py Torch (version 1.3.1).
Experiment Setup Yes We configure our model with two GCN layers and the output dimensionality of 2048 and 4096... We employ the threshold τ is 0.999... For the graph transformer layer, ... we set T = [0.2, 0.4, 1.0] for MS-COCO and T = [0.1, 0.3, 1.0] for Fashion550K. The negative slope of 0.2... is set for image representation learning using Leaky Re LU... Our data augmentation during training process... we resize images to 512 512 and randomly crop regions of 448 448 with random horizontal flips. We adopt SGD as the optimiser with the momentum is set to be 0.9. Weight decay is 10 4. The initial learning rate is 0.03 and 0.01... The learning rate decays by a factor of 10 for every 20 epochs, and the network is trained for 60 epochs in total.