Playing Lottery Tickets with Vision and Language
Authors: Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu652-660
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use UNITER as the main testbed (also test on LXMERT and Vi LT), and consolidate 7 representative VL tasks for experiments, including visual question answering, visual commonsense reasoning, visual entailment, referring expression comprehension, image-text retrieval, GQA, and NLVR2. Through comprehensive analysis, we summarize our main findings as follows. |
| Researcher Affiliation | Collaboration | 1Microsoft Corporation 2University of Texas at Austin 3Tsinghua University |
| Pseudocode | Yes | The full IMP procedure is provided in the Appendix. |
| Open Source Code | No | The paper states, "We use the official UNITER/LXMERT/Vi LT code bases for experiments," which refers to third-party code. It does not provide a link or explicit statement about releasing the code for their own methodology. |
| Open Datasets | Yes | We use both the in-domain and out-of-domain image-text datasets for IMP-based pre-training, including COCO (Lin et al. 2014), Visual Genome (Krishna et al. 2017), Conceptual Captions (Sharma et al. 2018), and SBU Captions (Ordonez, Kulkarni, and Berg 2011). |
| Dataset Splits | Yes | For VQA, we mainly report results on an internal mini-dev set for faster evaluation of the found tickets, and avoid submitting results to the VQA test server too frequently. This same mini-dev set is also used in UNITER (Chen et al. 2020d). |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU models, CPU, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "official UNITER/LXMERT/Vi LT code bases" but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | No | We use the default hyperparameters provided in the UNITER code base without any tuning. For UNITER pre-training, we use all the pre-training tasks to learn the mask, including Masked Language Modeling, Masked Region Modeling, Image-Text Matching, and Word-Region Alignment. See Chen et al. (2020d) for details of these tasks. The paper refers to an external source for hyperparameter details and does not explicitly list them in the main text. |