Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification

Authors: Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, Shilei Wen12709-12716

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two multi-label image classification datasets (MS-COCO and NUS-WIDE) show our method outperforms other existing state-of-the-arts. In addition, we validate our method on a large multi-label video classification dataset (You Tube-8M Segments) and the evaluation results demonstrate the generalization capability of our method. 4 Experiments To assess our model, we perform experiments on two benchmark multi-label image recognition datasets (MS-COCO (Lin et al. 2014) and NUS-WIDE (Chua et al. 2009)) . We also validate the effectiveness of our model on one multi-label video recognition dataset (You Tube-8M Segments) , and the results demonstrate the extensibility of our method.
Researcher Affiliation Collaboration Renchun You,1 Zhiyao Guo,2 Lei Cui,3 Xiang Long,1 Yingze Bao,1 Shilei Wen1 1Baidu VIS 2Computer Science Department, Xiamen University, China 3Department of Computer Science and Technology, Tsinghua University, China {yourenchun, longxiang, wenshilei}@baidu.com, {guozhiyao45, baoyingze}@gmail.com, cuil19@mails.tsinghua.edu.cn
Pseudocode No The paper describes methods using text and mathematical equations but does not include any labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets Yes We perform experiments on two benchmark multi-label image recognition datasets (MS-COCO (Lin et al. 2014) and NUS-WIDE (Chua et al. 2009)) . We also validate the effectiveness of our model on one multi-label video recognition dataset (You Tube-8M Segments).
Dataset Splits No The paper specifies training and testing splits for MS-COCO and NUS-WIDE (e.g., '82,081 images for training and 40,137 images for testing' for MS-COCO), but does not explicitly mention a separate validation set split or its size.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. It mentions using 'Res Net-101 network' and 'Inception network' but these refer to models/architectures, not hardware.
Software Dependencies No The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries with their versions).
Experiment Setup Yes In ASGE module, the dimensions of the three hidden layers and label embeddings are all set as 256. The optimizer is Stochastic Gradient Descent (SGD) with momentum 0.9 and the initial learning rate is 0.01. The batch size is set as 64. The optimizer is SGD with momentum 0.9. Weight decay is 10 5. The initial learning rate is 0.01 and decays by a factor 10 every 30 epochs. And the hyperparameter β in the Eq.12 is 0. in MS-COCO dataset and 0.4 in NUS-WIDE dataset. For the training of classification, the initial learning rate is 0.0002 and decay each 2 106 samples with momentum 0.8 . The hyperparameter β in the Eq.12 is 0. The optimizer is SGD with momentum 0.9. The batch size is 256.