reproducibilityindex.ai

Measuring Dejavu Memorization Efficiently

Authors: Narine Kokhlikyan, Bargav Jayaraman, Florian Bordes, Chuan Guo, Kamalika Chaudhuri

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that different ways of measuring memorization yield very similar aggregate results. We also find that open-source models typically have lower aggregate memorization than similar models trained on a subset of the data.
Researcher Affiliation	Industry	Narine Kokhlikyan FAIR at Meta Bargav Jayaraman FAIR at Meta Florian Bordes FAIR at Meta Chuan Guo FAIR at Meta Kamalika Chaudhuri FAIR at Meta
Pseudocode	No	The paper does not contain any blocks explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format.
Open Source Code	Yes	The code is available both for vision and vision language models.
Open Datasets	Yes	We conduct all our image representation learning experiments on Image Net Deng et al. [2009] dataset.
Dataset Splits	Yes	We use 300k (300 per class) examples to train the reference models to learn dataset-level correlations. We measure memorization accuracy on an additional disjoint set of 300k images. For the two model tests, these images are included in the training set of the target models, but not the reference models. Finally, we use another additional distinct 500k images to predict the nearest foreground object given the representation of a background crop through KNN.
Hardware Specification	Yes	The reference models are trained on a single machine with 8 Nvidia v100 GPUs, 32GB per GPU using 128 batch size.
Software Dependencies	No	We train CLIP models using the Open CLIP Ilharco et al. [2021] framework
Experiment Setup	Yes	We train our models for 200 epochs with a learning rate of 0.0005 and a warmup of 2000 steps for cosine learning rate scheduler. Our training runs use 512GB RAM and use 32 Nvidia A100 GPUs with a global batch size of 32 768.