TAN Without a Burn: Scaling Laws of DP-SGD
Authors: Tom Sander, Pierre Stock, Alexandre Sablayrolles
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the proposed method on CIFAR-10 and Image Net and, in particular, strongly improve the state-of-the-art on Image Net with a +9 points gain in top-1 accuracy for a privacy budget ε = 8. |
| Researcher Affiliation | Collaboration | 1CMAP, Ecole polytechnique, Palaiseau, France 2Meta AI, Paris, France. Correspondence to: Tom Sander <tomsander@meta.com>. |
| Pseudocode | No | The paper does not contain any section explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present any structured algorithmic steps in a code-like format. |
| Open Source Code | Yes | We open-source the training code at https://github.com/facebookresearch/tan. |
| Open Datasets | Yes | We use the CIFAR-10 dataset (Krizhevsky et al., 2009) which contains 50K 32 32 images grouped in 10 classes. The Image Net dataset (Deng et al., 2009; Russakovsky et al., 2014) contains 1.2 million images partitioned into 1000 categories. |
| Dataset Splits | No | The paper mentions training and testing data, but does not explicitly provide details about a validation dataset split or how it is defined (e.g., percentages, sample counts, or predefined splits). |
| Hardware Specification | Yes | Each hyper-parameter search for Image Net at B = 16,384 takes 4 days using 32 A100 GPUs; we reduce it to less than one day on a single A100 GPU. |
| Software Dependencies | No | The paper mentions software like Opacus, timm, and Pytorch, citing their respective sources, but does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9'). |
| Experiment Setup | Yes | We search over learning rates lr [1, 2, 4, 8, 12, 16], momentum parameters µ [0, 0.1, 0.5, 0.9, 1] and dampening factors d [0, 0.1, 0.5, 0.9, 1]. We use exponential moving average (EMA) on the weights (Tan & Le, 2019) with a decay parameter in [0.9, 0.99, 0.999, 0.9999, 0.99999]. |