Detecting and Grounding Important Characters in Visual Stories
Authors: Danyang Liu, Frank Keller
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For both tasks, we develop simple, unsupervised models based on distributional similarity and pre-trained vision-and-language models. Our new dataset, together with these models, can serve as the foundation for subsequent work on analysing and generating stories from a character-centric perspective. Experiments Character Detection We use the spaCy PoS tagger (https://spacy.io/api/tagger) to obtain the nouns in text and then identify the characters using Word Net (Miller 1995). For images, we obtain the faces using MTCNN (Zhang et al. 2016) and resize them to 160 160. Table 3 shows the results of character detection. |
| Researcher Affiliation | Academia | Danyang Liu, Frank Keller Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh danyang.liu@ed.ac.uk, keller@inf.ed.ac.uk |
| Pseudocode | Yes | Algorithm 1: Distributional Similarity-based Alignment Input: Textual chains {Ti}Kt i=1 and visual chains {Vi}Kv i=1 Output: Alignment results R. 1: Let R = [ ] and A = zero matrix with shape (Kt, Kv). |
| Open Source Code | No | We will release the dataset and codebase of the project, and hope this will benefit work in character-centric story understanding and generation. The paper states that the code *will be released* in the future, but does not provide immediate access or a specific link. |
| Open Datasets | Yes | In this paper, we introduce the VIST-Character dataset, which augments the test set of VIST (the Visual Storytelling dataset; Huang et al. 2016) with rich character-related annotation. |
| Dataset Splits | No | The paper mentions that the VIST-Character dataset augments the 'test set of VIST (the Visual Storytelling dataset; Huang et al. 2016)', but does not provide explicit details about train, validation, and test splits for the models developed or evaluated within the paper. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions software tools like spaCy, Neural Coref, Span BERT, MTCNN, CLIP, and Label Studio, but does not provide specific version numbers for any of them, which is necessary for reproducibility. |
| Experiment Setup | No | The paper describes the general approach and methods used (e.g., k-means clustering, use of specific pre-trained models like Span BERT and CLIP), but it does not provide specific experimental setup details such as hyperparameters (learning rates, batch sizes, number of epochs) or other training configurations required for reproducibility. |