Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling
Authors: Bo Wan, Wenjuan Han, Zilong Zheng, Tinne Tuytelaars
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a new evaluation metric: Critical Concept Recall Rate (CCRR) to explicitly evaluate VL grammar induction, and show a 2.6% improvement over a strong baseline on Flickr30k Entities. We also evaluate our model via two derived tasks, i.e., language grammar induction and phrase grounding, and improve over the state-of-the-art for both. |
| Researcher Affiliation | Collaboration | Bo Wan1, Wenjuan Han2 , Zilong Zheng2, Tinne Tuytelaars1 1. Department of Electrical Engineering, KU Leuven; 2. Beijing Institute for General Artificial Intelligence, Beijing, China |
| Pseudocode | No | The paper describes its model and algorithms using text, equations, and diagrams (Figure 2, 6, 7, 8, 9), but it does not include a distinct pseudocode block or a clearly labeled algorithm. |
| Open Source Code | Yes | Code is available at https://github.com/bobwan1995/cliora.git. All the codes, processed data, and the trained model in this paper are publicly released at https://github.com/bobwan1995/cliora.git. |
| Open Datasets | Yes | We evaluate our method on the Flickr30k Entities (Plummer et al., 2017) and MSCOCO (Lin et al., 2014) datasets. |
| Dataset Splits | Yes | Flickr30K Entities contains 29783 images for training, 1,000 images for validation and 1,000 for test. We use the same split of MSCOCO as Zhao & Titov (2020b), which contains 82,783 training images, 1,000 validation images, and 1,000 test images. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using several tools and libraries like 'Faster R-CNN', 'Ro I-Align', 'ELMo', 'Glove embedding', and 'Benepar', but it does not specify version numbers for any of these software components, nor for broader frameworks like PyTorch or TensorFlow. |
| Experiment Setup | Yes | We load DIORA as an initialization for CLIORA . Other detailed hyper-parameters are provided in Appendix F. Table 5: λ 0.5 γ 0.5 # Epoch 10 Learning rate 1e 5 Batch Size 64. |