reproducibilityindex.ai

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

Authors: Thao Le Minh, Nobuyuki Shimizu, Takashi Miyazaki, Koichi Shinoda

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments to demonstrate the effectiveness of our proposed model as well as the beneﬁt of our dataset. We compared our proposed model against the two unimodal recognition models for addressee recognition, as shown in Table 3. There were 369,306 utterances and corresponding images used for training; 123,102 for testing and the remaining 123,102 as the validation set for adjusting the classiﬁer.
Researcher Affiliation	Collaboration	1 Tokyo Institute of Technology, Tokyo, Japan 2 Yahoo Japan Corporation
Pseudocode	No	The paper includes a 'Network Architecture' diagram (Figure 2) but does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper states: 'Our ARVSU dataset will be released at https: //research-lab.yahoo.co.jp/en/software/.' This only refers to the dataset and not the open-source code for the described methodology.
Open Datasets	Yes	we created a mock dataset called Addressee Recognition in Visual Scenes with Utterances (ARVSU). Our ARVSU dataset will be released at https: //research-lab.yahoo.co.jp/en/software/.
Dataset Splits	Yes	There were 369,306 utterances and corresponding images used for training; 123,102 for testing and the remaining 123,102 as the validation set for adjusting the classiﬁer.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU or CPU models. It only mentions software frameworks like Keras and TensorFlow.
Software Dependencies	No	The paper states: 'The proposed model was implemented using Keras 1 with Tensor Flow backend.' While 'Keras 1' is a specific major version, the version for TensorFlow is not provided, and only one component has a version specified.
Experiment Setup	Yes	The learning rate was set to 0.001 and the batch size was set to 64.