UNISON: Unpaired Cross-Lingual Image Captioning
Authors: Jiahui Gao, Yi Zhou, Philip L. H. Yu, Shafiq Joty, Jiuxiang Gu10654-10662
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify the effectiveness of our proposed method on the Chinese image caption generation task. The comparisons against several existing methods demonstrate the effectiveness of our approach. ... Our experiments show 1) the effectiveness of the proposed HGM when conducting cross-lingual alignment( 5.2) in the scene graph encoding space and 2) the superior performance of our UNISON framework as a whole( 5.1). |
| Researcher Affiliation | Collaboration | 1The University of Hong Kong, Hong Kong 2Johns Hopkins University, USA 3The Education University of Hong Kong, Hong Kong 4Nanyang Technological University, Singapore 5Adobe Research, USA |
| Pseudocode | No | The paper describes its methods using equations and textual explanations of the framework phases, but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating the release of its source code for the described methodology. A mention of 'Code is acquired from the first author of (Gu et al. 2019)' refers to a comparison method's code, not their own. |
| Open Datasets | Yes | For cross-lingual auto-encoding, we collect a paired English-Chinese corpus from existing MT datasets, including WMT19 (Barrault et al. 2019), AIC MT (Wu et al. 2017), UM (Tian et al. 2014), and Trans-zh (Brightmart 2019)1. ... For the second phase, following Li et al. (2019), we use 18,341 training images from MSCOCO and randomly select 18,341 Chinese sentences from the training split of the MT corpus. |
| Dataset Splits | Yes | For the first phase, we use 151,613 sentence pairs for training, 5,000 sentence pairs for validation, and 5,000 pairs for testing. |
| Hardware Specification | No | The paper does not specify the hardware details, such as GPU or CPU models, processor types, or memory amounts, used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like MOTIFS, Jieba, and Adam for its implementation and experiments, but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | During cross-lingual auto-encoding phase, we set the dimension of scene graph embeddings to 1,000 and dc to 100. LSTM with 2 layers is adopted to construct the decoder, whose hidden size is 1000. ... We optimize the model with Adam, batch size of 50, and learning rate of 5 10 5. ... We set λ to 10. During inference, we use beam search with a beam size of 5. |