Differentially Private Representation Learning via Image Captioning
Authors: Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Oliviero Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of ε = 8 for the LAION dataset, a linear classifier trained on top of learned DP-Cap features attains 65.8% accuracy on Image Net-1K, considerably improving the previous SOTA of 56.5%. |
| Researcher Affiliation | Collaboration | 1Meta 2CMAP, Ecole polytechnique 3UC Berkeley 4UCSD. |
| Pseudocode | No | The paper describes the methods in prose and uses mathematical equations for DP and loss functions, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/ facebookresearch/dpcap. |
| Open Datasets | Yes | Following the approach introduced by Yu et al. (2023), we first pre-train on the Shader21k dataset (Baradad et al., 2022) of synthetic images. We then train with DP-SGD on a subset comprising 233 million deduplicated (using Sem De Dup (Abbas et al., 2023)), NSFWfiltered and face-blurred (using an approach similar to Yang et al. (2021)) image-caption pairs from the (English-only) LAION-2B dataset (Schuhmann et al., 2022). |
| Dataset Splits | No | We use the Image Net-1K (Deng et al., 2009a; Russakovsky et al., 2014), CIFAR-10/100 (Krizhevsky et al., 2009), Places-365/205 (Zhou et al., 2014) and i Naturalist2021 (Van Horn et al., 2021) image classification datasets to assess the performance of learned image representations via full linear probing, few-shot linear probing, and zeroshot prediction. |
| Hardware Specification | Yes | a naive implementation of DP-Cap with per-sample gradient computation using functorch would take approximately 61 days(!) on 128 NVIDIA V100 GPUs. The number of GPU hours is estimated on a single NVIDIA V100 GPU with 32GB memory using 100K samples. |
| Software Dependencies | No | We use RDP accounting with subsampling from the Opacus library (Yousefpour et al., 2021). Combining these two techniques with DP-SGD requires careful considerations to ensure both a correct DP guarantee as well as numerical stability. We detail the implementation in Appendix A.2. |
| Experiment Setup | Yes | Our choice of gradient clipping factor is C = 1, as we did not observe any performance improvement with other values. We always use Adam W (Loshchilov & Hutter, 2018) for training. We use a learning rate of 5.12 10 4. ... We use a maximum length of 40 tokens to process the LAION captions. We use a linear schedule, with 40% of warm-up iterations, and 2 the entire training as decay horizon. As opposed to what was previously observed (De et al., 2022; Sander et al., 2023), the learning rate schedule played an important role for us with DP-SGD training. We use a weight decay of 0.05. For our ε = 8 models, we limited training to 32 epochs, a process that took 5 days utilizing 128 V100 GPUs for the Base model. To fit the privacy budget while utilizing a batch size of 1.3 million and training for 32 epochs, RDP analysis yields σ = 0.728. |