Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Authors: Bo Wan, Wenjuan Han, Zilong Zheng, Tinne Tuytelaars

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce a new evaluation metric: Critical Concept Recall Rate (CCRR) to explicitly evaluate VL grammar induction, and show a 2.6% improvement over a strong baseline on Flickr30k Entities. We also evaluate our model via two derived tasks, i.e., language grammar induction and phrase grounding, and improve over the state-of-the-art for both.
Researcher Affiliation Collaboration Bo Wan1, Wenjuan Han2 , Zilong Zheng2, Tinne Tuytelaars1 1. Department of Electrical Engineering, KU Leuven; 2. Beijing Institute for General Artificial Intelligence, Beijing, China
Pseudocode No The paper describes its model and algorithms using text, equations, and diagrams (Figure 2, 6, 7, 8, 9), but it does not include a distinct pseudocode block or a clearly labeled algorithm.
Open Source Code Yes Code is available at https://github.com/bobwan1995/cliora.git. All the codes, processed data, and the trained model in this paper are publicly released at https://github.com/bobwan1995/cliora.git.
Open Datasets Yes We evaluate our method on the Flickr30k Entities (Plummer et al., 2017) and MSCOCO (Lin et al., 2014) datasets.
Dataset Splits Yes Flickr30K Entities contains 29783 images for training, 1,000 images for validation and 1,000 for test. We use the same split of MSCOCO as Zhao & Titov (2020b), which contains 82,783 training images, 1,000 validation images, and 1,000 test images.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using several tools and libraries like 'Faster R-CNN', 'Ro I-Align', 'ELMo', 'Glove embedding', and 'Benepar', but it does not specify version numbers for any of these software components, nor for broader frameworks like PyTorch or TensorFlow.
Experiment Setup Yes We load DIORA as an initialization for CLIORA . Other detailed hyper-parameters are provided in Appendix F. Table 5: λ 0.5 γ 0.5 # Epoch 10 Learning rate 1e 5 Batch Size 64.