Feature Deformation Meta-Networks in Image Captioning of Novel Objects

Authors: Tingjia Cao, Ke Han, Xiaomei Wang, Lin Ma, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue10494-10501

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on the widely used novel object captioning dataset, and the results show the effectiveness of our FDM-net. Ablation study and qualitative visualization further give insights of our model.
Researcher Affiliation Collaboration 1Shanghai Key Lab of Intelligent Information Processing, School of Computer Science Fudan University 2School of Data Science, and MOE Frontiers Center for Brain Science, Fudan University, 3Tencent AI Lab
Pseudocode No The paper describes the steps of the method in paragraph text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes We follow the novel object captioning split (NOC split) introduced by (Anne Hendricks et al. 2016) to evaluate our proposed method. It comes from the standard split of MSCOCO 2014 (Chen et al. 2015) that contains 120K images, and each image is labelled with five human-annotated sentences. ... To evaluate the expandability of our method, we also conduct experiments on Open Image dataset a large-scale dataset.
Dataset Splits Yes In the standard validation dataset, half of the pairs are randomly selected into new validation set, and others are selected into the test set.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes In our model, we use the pre-trained bottom-up attention model to extract visual features. To make a fair comparison, we use traditional cross-entropy loss during training. The Ro I features of novel objects come from Open Image. Specially, we use mis-labelled probability strategy (MLS) to select top similar seen objects for each unseen object. As shown in Tab. 1, the top three similar seen objects are considered to conduct the replacing work with their corresponding novel objects. That means we set k = 3. Besides, constrained beam search (CBS) algorithm (Koehn 2016) is also applied in the test and validation stage. For ensuring the diversity of our augmented dataset, we extract 100 novel object features as resources for the following replacement.