Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning

Authors: Zaid Khan, Yun Fu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training and that parameter-efficient scaling scales with model and dataset size.
Researcher Affiliation Academia Zaid Khan, Yun Fu Northeastern University, Boston, USA {khan.za, y.fu}@northeastern.edu
Pseudocode No The paper does not contain explicit pseudocode or algorithm blocks.
Open Source Code Yes Code and weights at https://github.com/codezakh/Lil T.
Open Datasets Yes Datasets We draw 591, 753 image-text pairs from the training set of COCO2014Lin et al. (2014), following the split of Karpathy & Fei-Fei (2017).
Dataset Splits Yes Datasets We draw 591, 753 image-text pairs from the training set of COCO2014Lin et al. (2014), following the split of Karpathy & Fei-Fei (2017).
Hardware Specification Yes We train each model with a batch size of 512 on 4x NVIDIA A6000 GPUs for 15 epochs, using the Adam W optimizer (Loshchilov & Hutter, 2017) optimizer with a weight decay of 0.02.
Software Dependencies No The paper mentions 'Adam W optimizer' but does not specify version numbers for general software dependencies (e.g., Python, PyTorch, TensorFlow) or any specific libraries/packages beyond the optimizer.
Experiment Setup Yes We train each model with a batch size of 512 on 4x NVIDIA A6000 GPUs for 15 epochs, using the Adam W optimizer (Loshchilov & Hutter, 2017) optimizer with a weight decay of 0.02. The learning rate is warned up to 1e 4 in the first 10 epochs, and then decayed to 1e 5. We use random crops of resolution 256 256 with Rand Augment(Cubuk et al., 2020), with colors transformations removed following Li et al. (2021a).