Playing Lottery Tickets with Vision and Language

Authors: Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu652-660

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use UNITER as the main testbed (also test on LXMERT and Vi LT), and consolidate 7 representative VL tasks for experiments, including visual question answering, visual commonsense reasoning, visual entailment, referring expression comprehension, image-text retrieval, GQA, and NLVR2. Through comprehensive analysis, we summarize our main findings as follows.
Researcher Affiliation Collaboration 1Microsoft Corporation 2University of Texas at Austin 3Tsinghua University
Pseudocode Yes The full IMP procedure is provided in the Appendix.
Open Source Code No The paper states, "We use the official UNITER/LXMERT/Vi LT code bases for experiments," which refers to third-party code. It does not provide a link or explicit statement about releasing the code for their own methodology.
Open Datasets Yes We use both the in-domain and out-of-domain image-text datasets for IMP-based pre-training, including COCO (Lin et al. 2014), Visual Genome (Krishna et al. 2017), Conceptual Captions (Sharma et al. 2018), and SBU Captions (Ordonez, Kulkarni, and Berg 2011).
Dataset Splits Yes For VQA, we mainly report results on an internal mini-dev set for faster evaluation of the found tickets, and avoid submitting results to the VQA test server too frequently. This same mini-dev set is also used in UNITER (Chen et al. 2020d).
Hardware Specification No The paper does not specify the hardware (e.g., GPU models, CPU, memory) used for running the experiments.
Software Dependencies No The paper mentions using "official UNITER/LXMERT/Vi LT code bases" but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup No We use the default hyperparameters provided in the UNITER code base without any tuning. For UNITER pre-training, we use all the pre-training tasks to learn the mask, including Masked Language Modeling, Masked Region Modeling, Image-Text Matching, and Word-Region Alignment. See Chen et al. (2020d) for details of these tasks. The paper refers to an external source for hyperparameter details and does not explicitly list them in the main text.