Data Filtering Networks

Authors: Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander T Toshev, Vaishaal Shankar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Specifically, our best performing dataset DFN-5B enables us to train stateof-the-art CLIP models for their compute budgets: among other improvements on a variety of tasks, a Vi T-H trained on our dataset achieves 84.4% zero-shot transfer accuracy on Image Net, out-performing models trained on other datasets such as LAION-2B, Data Comp-1B, or Open AI s WIT.
Researcher Affiliation Collaboration Alex Fang 1,2 Albin Madappally Jose 1 Amit Jain1 Ludwig Schmidt2 Alexander Toshev1 Vaishaal Shankar1 1Apple 2University of Washington
Pseudocode Yes We show pseudocode of the basic CLIP filtering operation in Appendix H.
Open Source Code No The paper states it releases a dataset (DFN-2B) and model checkpoints, but does not explicitly state the release of the source code for the methodology described in the paper. The 'Model Link' in Table 8 points to checkpoints, not source code.
Open Datasets Yes In addition, we release DFN-2B for the community to enable research on large image-text models. (...) We train a Vi T-B/32 on Conceptual Caption12M, Conceptual Captions 3M, and Shutterstock 15M (Changpinyo et al., 2021; Sharma et al., 2018; Nguyen et al., 2023).
Dataset Splits Yes Data Comp provides a multi-scale evaluation framework for datasets by measuring CLIP model zero-shot performance. It provides 4 nested unfiltered image-text pair pools of increasing size. In this work, we use the medium (128M datapoints), large (1.28B datapoints) and xlarge(12.8B datapoints) pools. We also follow the Data Comp guidelines of model hyperparameters for each of these pools, which are Vi T-B/32 for medium, Vi T-B/16 for large and Vi T-L/14 for XL.
Hardware Specification Yes Our actual training runs on both Nvidia A100s and TPU v4s.
Software Dependencies No The paper mentions software dependencies like "Open Clip and AXlearn" but does not provide specific version numbers for these software components.
Experiment Setup Yes Exact hyperparameters can be found in Table 6. (...) DFNs trained for ablations use Data Comp large scale hyperparameters with a Vi T-B/32 instead of a Vi T-B/16. Final DFNs that induce DC-2B train for 5.12B samples, 16,384 batch size, and 2,000 steps of warmup.