reproducibilityindex.ai

Data Filtering Networks

Authors: Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander T Toshev, Vaishaal Shankar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Specifically, our best performing dataset DFN-5B enables us to train stateof-the-art CLIP models for their compute budgets: among other improvements on a variety of tasks, a Vi T-H trained on our dataset achieves 84.4% zero-shot transfer accuracy on Image Net, out-performing models trained on other datasets such as LAION-2B, Data Comp-1B, or Open AI s WIT.
Researcher Affiliation	Collaboration	Alex Fang 1,2 Albin Madappally Jose 1 Amit Jain1 Ludwig Schmidt2 Alexander Toshev1 Vaishaal Shankar1 1Apple 2University of Washington
Pseudocode	Yes	We show pseudocode of the basic CLIP filtering operation in Appendix H.
Open Source Code	No	The paper states it releases a dataset (DFN-2B) and model checkpoints, but does not explicitly state the release of the source code for the methodology described in the paper. The 'Model Link' in Table 8 points to checkpoints, not source code.
Open Datasets	Yes	In addition, we release DFN-2B for the community to enable research on large image-text models. (...) We train a Vi T-B/32 on Conceptual Caption12M, Conceptual Captions 3M, and Shutterstock 15M (Changpinyo et al., 2021; Sharma et al., 2018; Nguyen et al., 2023).
Dataset Splits	Yes	Data Comp provides a multi-scale evaluation framework for datasets by measuring CLIP model zero-shot performance. It provides 4 nested unfiltered image-text pair pools of increasing size. In this work, we use the medium (128M datapoints), large (1.28B datapoints) and xlarge(12.8B datapoints) pools. We also follow the Data Comp guidelines of model hyperparameters for each of these pools, which are Vi T-B/32 for medium, Vi T-B/16 for large and Vi T-L/14 for XL.
Hardware Specification	Yes	Our actual training runs on both Nvidia A100s and TPU v4s.
Software Dependencies	No	The paper mentions software dependencies like "Open Clip and AXlearn" but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Exact hyperparameters can be found in Table 6. (...) DFNs trained for ablations use Data Comp large scale hyperparameters with a Vi T-B/32 instead of a Vi T-B/16. Final DFNs that induce DC-2B train for 5.12B samples, 16,384 batch size, and 2,000 steps of warmup.