Linking People across Text and Images Based on Social Relation Reasoning

Authors: Yang Lei, Peizhi Zhao, Pijian Li, Yi Cai, Qingbao Huang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the accuracy of our proposed SRR model outperforms the state-of-the-art models on the challenging datasets Who s Waldo and FL: MSRE, by more than 5% and 7%, respectively.
Researcher Affiliation Academia 1School of Electrical Engineering, Guangxi University, Nanning, China 2Guangxi Key Laboratory of Multimedia Communications and Network Technology 3School of Software Engineering, South China University of Technology, Guangzhou, China 4Key Laboratory of Big Data and Intelligent Robot (SCUT), MOE of China 5Peng Cheng Laboratory, Shenzhen, China {2012391019, 2112391073, 1912302005}@st.gxu.edu.cn, ycai@scut.edu.cn, qbhuang@gxu.edu.cn
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Our source code is available at https://github.com/VILAN-Lab/SRR.
Open Datasets Yes Who s Waldo (Cui et al. 2021) consists of 19.2K image-sentence pairs in Wikimedia Commons by people sifting, which are split into 17.9K training, 6.7K validation, and 6.7K test image sentence pairs. It is originally designed for linking people across text and images. Currently, this is the largest dataset on this task. FL: MSRE (Wan et al. 2021) consists of 3, 716 images and 6, 485 sentences.
Dataset Splits Yes Who s Waldo (Cui et al. 2021) consists of 19.2K image-sentence pairs in Wikimedia Commons by people sifting, which are split into 17.9K training, 6.7K validation, and 6.7K test image sentence pairs.
Hardware Specification No No specific hardware details (such as GPU/CPU models, memory, or specific machine names) used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions software components like Faster R-CNN, GloVe, and Adam optimizer but does not specify their version numbers.
Experiment Setup Yes Specifically, during pretraining the SRE module, learning-rate is setting to 1e 3, dropout rate is setting to 0.5 for a total of 80 epochs. Inside the SRE module, the number of multi-head attention is set to 6 with dimension 300 for each head in the 3 layers transformer encoder. Layers of GCNs is set to 2. Fully connected layers have the same dropout rate 0.3. Gradients are clipped to 0.25. Batch size is set as 8. Adam optimizer (Kingma and Ba 2015) is used with initial learning rate of 2e 4. The learning rate is halved every 10 epochs for a total of 50 epochs.