reproducibilityindex.ai

Detecting and Grounding Important Characters in Visual Stories

Authors: Danyang Liu, Frank Keller

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For both tasks, we develop simple, unsupervised models based on distributional similarity and pre-trained vision-and-language models. Our new dataset, together with these models, can serve as the foundation for subsequent work on analysing and generating stories from a character-centric perspective. Experiments Character Detection We use the spaCy PoS tagger (https://spacy.io/api/tagger) to obtain the nouns in text and then identify the characters using Word Net (Miller 1995). For images, we obtain the faces using MTCNN (Zhang et al. 2016) and resize them to 160 160. Table 3 shows the results of character detection.
Researcher Affiliation	Academia	Danyang Liu, Frank Keller Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh danyang.liu@ed.ac.uk, keller@inf.ed.ac.uk
Pseudocode	Yes	Algorithm 1: Distributional Similarity-based Alignment Input: Textual chains {Ti}Kt i=1 and visual chains {Vi}Kv i=1 Output: Alignment results R. 1: Let R = [ ] and A = zero matrix with shape (Kt, Kv).
Open Source Code	No	We will release the dataset and codebase of the project, and hope this will benefit work in character-centric story understanding and generation. The paper states that the code will be released in the future, but does not provide immediate access or a specific link.
Open Datasets	Yes	In this paper, we introduce the VIST-Character dataset, which augments the test set of VIST (the Visual Storytelling dataset; Huang et al. 2016) with rich character-related annotation.
Dataset Splits	No	The paper mentions that the VIST-Character dataset augments the 'test set of VIST (the Visual Storytelling dataset; Huang et al. 2016)', but does not provide explicit details about train, validation, and test splits for the models developed or evaluated within the paper.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments.
Software Dependencies	No	The paper mentions software tools like spaCy, Neural Coref, Span BERT, MTCNN, CLIP, and Label Studio, but does not provide specific version numbers for any of them, which is necessary for reproducibility.
Experiment Setup	No	The paper describes the general approach and methods used (e.g., k-means clustering, use of specific pre-trained models like Span BERT and CLIP), but it does not provide specific experimental setup details such as hyperparameters (learning rates, batch sizes, number of epochs) or other training configurations required for reproducibility.