Multi-View Visual Semantic Embedding
Authors: Zheng Li, Caili Guo, Zerun Feng, Jenq-Neng Hwang, Xijun Xue
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the Flickr30K and MS-COCO datasets demonstrate the superior performance of our framework. |
| Researcher Affiliation | Collaboration | 1Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications 2Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications 3University of Washington 4China Telecom System Integration Co.,Ltd {lizhengzachary, guocaili, fengzerun}@bupt.edu.cn, hwang@uw.edu, xuexj@chinatelecom.cn |
| Pseudocode | No | The paper describes its framework and methods but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We evaluate our method on two standard benchmarks: Flickr30K [Young et al., 2014] and MS-COCO [Lin et al., 2014]. |
| Dataset Splits | Yes | Flickr30K dataset contains 31,000 images, each image is annotated with 5 sentences. Following the data split of [Faghri et al., 2018], we use 1,000 images for validation, 1,000 images for testing, and the remaining for training. MS-COCO dataset contains 123,287 images, and each image comes with 5 sentences. We mirror the data split setting of [Faghri et al., 2018]. More specifically, we use 113,287 images for training, 5,000 images for validation, and 5,000 images for testing. We report results on both 1,000 test images (averaged over 5 folds) and the full 5,000 test images. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like Res Net, Faster R-CNN, Bi GRU, BERT-base, and Sentence-BERT, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Parameters are set as K = 3, λ = 0.7, for both Flickr30K and MS-COCO. |