Feature Deformation Meta-Networks in Image Captioning of Novel Objects
Authors: Tingjia Cao, Ke Han, Xiaomei Wang, Lin Ma, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue10494-10501
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on the widely used novel object captioning dataset, and the results show the effectiveness of our FDM-net. Ablation study and qualitative visualization further give insights of our model. |
| Researcher Affiliation | Collaboration | 1Shanghai Key Lab of Intelligent Information Processing, School of Computer Science Fudan University 2School of Data Science, and MOE Frontiers Center for Brain Science, Fudan University, 3Tencent AI Lab |
| Pseudocode | No | The paper describes the steps of the method in paragraph text but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We follow the novel object captioning split (NOC split) introduced by (Anne Hendricks et al. 2016) to evaluate our proposed method. It comes from the standard split of MSCOCO 2014 (Chen et al. 2015) that contains 120K images, and each image is labelled with five human-annotated sentences. ... To evaluate the expandability of our method, we also conduct experiments on Open Image dataset a large-scale dataset. |
| Dataset Splits | Yes | In the standard validation dataset, half of the pairs are randomly selected into new validation set, and others are selected into the test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In our model, we use the pre-trained bottom-up attention model to extract visual features. To make a fair comparison, we use traditional cross-entropy loss during training. The Ro I features of novel objects come from Open Image. Specially, we use mis-labelled probability strategy (MLS) to select top similar seen objects for each unseen object. As shown in Tab. 1, the top three similar seen objects are considered to conduct the replacing work with their corresponding novel objects. That means we set k = 3. Besides, constrained beam search (CBS) algorithm (Koehn 2016) is also applied in the test and validation stage. For ensuring the diversity of our augmented dataset, we extract 100 novel object features as resources for the following replacement. |