reproducibilityindex.ai

VPGTrans: Transfer Visual Prompt Generator across LLMs

Authors: Ao Zhang, Hao Fei, Yuan Yao, Wei Ji, Li Li, Zhiyuan Liu, Tat-Seng Chua

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Via extensive experiments on the transfer across LLM sizes and types (cf. 4 & 5), we gain the following key observations:
Researcher Affiliation	Academia	Ao Zhang 1 Hao Fei 1 Yuan Yao 2 Wei Ji 1 Li Li 1 Zhiyuan Liu 2 Tat-Seng Chua 1 1 NEx T++ Lab, School of Computing, National University of Singapore 2Department of Computer Science and Technology, Tsinghua University
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	All codes and models is released at https://github.com/VPGTrans/VPGTrans.
Open Datasets	Yes	For all of the exploration experiments, we adopt human-annotated COCO caption dataset [34] and web image-text pairs SBU dataset [40], which results in 1.4 million image-text pairs.
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility, although it mentions evaluating on common datasets and includes 'val' in table headers.
Hardware Specification	Yes	For example, training a BLIP-2 Flan T5XXL needs over 600 A100-GPU hours on over 100 million image-text pairs. The word converter training only requires updating a linear layer on tokenized text data and typically takes less than 10 minutes on 1 A100 GPU with less than 15G GPU memory.
Software Dependencies	No	The paper mentions using FP16 and BFloat16, and following BLIP-2's open code, but does not provide specific software names with version numbers (e.g., Python, PyTorch versions).
Experiment Setup	Yes	For the learning rate, we first conduct a linear warm-up from 1e-6 to 1e-4, and then use a cosine learning rate schedule with the minimal lr=1e-5 for 10 epochs. Specifically, we set the batch size of 1,728 and 1,152 for OPT and Flan T5-based models, respectively.