Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances
Authors: Thao Le Minh, Nobuyuki Shimizu, Takashi Miyazaki, Koichi Shinoda
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments to demonstrate the effectiveness of our proposed model as well as the benefit of our dataset. We compared our proposed model against the two unimodal recognition models for addressee recognition, as shown in Table 3. There were 369,306 utterances and corresponding images used for training; 123,102 for testing and the remaining 123,102 as the validation set for adjusting the classifier. |
| Researcher Affiliation | Collaboration | 1 Tokyo Institute of Technology, Tokyo, Japan 2 Yahoo Japan Corporation |
| Pseudocode | No | The paper includes a 'Network Architecture' diagram (Figure 2) but does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states: 'Our ARVSU dataset will be released at https: //research-lab.yahoo.co.jp/en/software/.' This only refers to the dataset and not the open-source code for the described methodology. |
| Open Datasets | Yes | we created a mock dataset called Addressee Recognition in Visual Scenes with Utterances (ARVSU). Our ARVSU dataset will be released at https: //research-lab.yahoo.co.jp/en/software/. |
| Dataset Splits | Yes | There were 369,306 utterances and corresponding images used for training; 123,102 for testing and the remaining 123,102 as the validation set for adjusting the classifier. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models. It only mentions software frameworks like Keras and TensorFlow. |
| Software Dependencies | No | The paper states: 'The proposed model was implemented using Keras 1 with Tensor Flow backend.' While 'Keras 1' is a specific major version, the version for TensorFlow is not provided, and only one component has a version specified. |
| Experiment Setup | Yes | The learning rate was set to 0.001 and the batch size was set to 64. |