Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Playing Lottery Tickets with Vision and Language

Authors: Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu652-660

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use UNITER as the main testbed (also test on LXMERT and Vi LT), and consolidate 7 representative VL tasks for experiments, including visual question answering, visual commonsense reasoning, visual entailment, referring expression comprehension, image-text retrieval, GQA, and NLVR2. Through comprehensive analysis, we summarize our main ﬁndings as follows.
Researcher Affiliation	Collaboration	1Microsoft Corporation 2University of Texas at Austin 3Tsinghua University
Pseudocode	Yes	The full IMP procedure is provided in the Appendix.
Open Source Code	No	The paper states, "We use the ofﬁcial UNITER/LXMERT/Vi LT code bases for experiments," which refers to third-party code. It does not provide a link or explicit statement about releasing the code for their own methodology.
Open Datasets	Yes	We use both the in-domain and out-of-domain image-text datasets for IMP-based pre-training, including COCO (Lin et al. 2014), Visual Genome (Krishna et al. 2017), Conceptual Captions (Sharma et al. 2018), and SBU Captions (Ordonez, Kulkarni, and Berg 2011).
Dataset Splits	Yes	For VQA, we mainly report results on an internal mini-dev set for faster evaluation of the found tickets, and avoid submitting results to the VQA test server too frequently. This same mini-dev set is also used in UNITER (Chen et al. 2020d).
Hardware Specification	No	The paper does not specify the hardware (e.g., GPU models, CPU, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using "ofﬁcial UNITER/LXMERT/Vi LT code bases" but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	No	We use the default hyperparameters provided in the UNITER code base without any tuning. For UNITER pre-training, we use all the pre-training tasks to learn the mask, including Masked Language Modeling, Masked Region Modeling, Image-Text Matching, and Word-Region Alignment. See Chen et al. (2020d) for details of these tasks. The paper refers to an external source for hyperparameter details and does not explicitly list them in the main text.