reproducibilityindex.ai

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Authors: Yiping Wang, Yifang Chen, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin G. Jamieson, Simon S. Du

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our methods on the data selection benchmark, Data Comp [1]. Compared to the best baseline using only Open AI s CLIP-L/14, our methods achieve a 5.3% improvement on Image Net-1k and a 2.8% improvement on 38 downstream evaluation tasks.
Researcher Affiliation	Academia	Yiping Wang University of Washington Yifang Chen University of Washington Wendan Yan University of Washington Alex Fang University of Washington Wenjing Zhou University of Michigan Kevin Jamieson University of Washington Simon Shaolei Du University of Washington
Pseudocode	Yes	Algorithm 1 neg CLIPLoss
Open Source Code	Yes	Codes are available at https://github.com/ypwang61/negCLIPLoss_NormSim.
Open Datasets	Yes	We test our methods on the data selection benchmark, Data Comp [1]. Compared to the best baseline using only Open AI s CLIP-L/14, our methods achieve a 5.3% improvement on Image Net-1k and a 2.8% improvement on 38 downstream evaluation tasks.
Dataset Splits	No	The paper states, 'We adhere to the standardized training and evaluation protocols of the Data Comp benchmark [1].' and discusses training on subsets of Data Comp-medium and evaluating on 38 downstream datasets. While it implies standard splits, it does not explicitly provide validation dataset splits in terms of percentages or sample counts.
Hardware Specification	Yes	For MLM, they mention that they need 6.1 minutes to process 10k samples on A100, which results in 1120 A100 hours for our dataset (110M).
Software Dependencies	No	The paper mentions 'pytorch-style parallel matrix calculation' and the 'faiss library' for k-means clustering, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We employ the medium-scale training configuration of Data Comp (Data Comp-medium). It provides a substantial dataset comprising 128 million low-quality, web-curated image-text pairs to be filtered. Once the data subset is obtained by some data filtering strategy, it will be used to train a fixed CLIP-B/32 model in a fixed training budget that allows the model to pass 128 million data points an epoch.