Multi-View Visual Semantic Embedding

Authors: Zheng Li, Caili Guo, Zerun Feng, Jenq-Neng Hwang, Xijun Xue

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Flickr30K and MS-COCO datasets demonstrate the superior performance of our framework.
Researcher Affiliation Collaboration 1Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications 2Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications 3University of Washington 4China Telecom System Integration Co.,Ltd {lizhengzachary, guocaili, fengzerun}@bupt.edu.cn, hwang@uw.edu, xuexj@chinatelecom.cn
Pseudocode No The paper describes its framework and methods but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We evaluate our method on two standard benchmarks: Flickr30K [Young et al., 2014] and MS-COCO [Lin et al., 2014].
Dataset Splits Yes Flickr30K dataset contains 31,000 images, each image is annotated with 5 sentences. Following the data split of [Faghri et al., 2018], we use 1,000 images for validation, 1,000 images for testing, and the remaining for training. MS-COCO dataset contains 123,287 images, and each image comes with 5 sentences. We mirror the data split setting of [Faghri et al., 2018]. More specifically, we use 113,287 images for training, 5,000 images for validation, and 5,000 images for testing. We report results on both 1,000 test images (averaged over 5 folds) and the full 5,000 test images.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software components like Res Net, Faster R-CNN, Bi GRU, BERT-base, and Sentence-BERT, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Parameters are set as K = 3, λ = 0.7, for both Flickr30K and MS-COCO.