TAN Without a Burn: Scaling Laws of DP-SGD

Authors: Tom Sander, Pierre Stock, Alexandre Sablayrolles

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply the proposed method on CIFAR-10 and Image Net and, in particular, strongly improve the state-of-the-art on Image Net with a +9 points gain in top-1 accuracy for a privacy budget ε = 8.
Researcher Affiliation Collaboration 1CMAP, Ecole polytechnique, Palaiseau, France 2Meta AI, Paris, France. Correspondence to: Tom Sander <tomsander@meta.com>.
Pseudocode No The paper does not contain any section explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present any structured algorithmic steps in a code-like format.
Open Source Code Yes We open-source the training code at https://github.com/facebookresearch/tan.
Open Datasets Yes We use the CIFAR-10 dataset (Krizhevsky et al., 2009) which contains 50K 32 32 images grouped in 10 classes. The Image Net dataset (Deng et al., 2009; Russakovsky et al., 2014) contains 1.2 million images partitioned into 1000 categories.
Dataset Splits No The paper mentions training and testing data, but does not explicitly provide details about a validation dataset split or how it is defined (e.g., percentages, sample counts, or predefined splits).
Hardware Specification Yes Each hyper-parameter search for Image Net at B = 16,384 takes 4 days using 32 A100 GPUs; we reduce it to less than one day on a single A100 GPU.
Software Dependencies No The paper mentions software like Opacus, timm, and Pytorch, citing their respective sources, but does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9').
Experiment Setup Yes We search over learning rates lr [1, 2, 4, 8, 12, 16], momentum parameters µ [0, 0.1, 0.5, 0.9, 1] and dampening factors d [0, 0.1, 0.5, 0.9, 1]. We use exponential moving average (EMA) on the weights (Tan & Le, 2019) with a decay parameter in [0.9, 0.99, 0.999, 0.9999, 0.99999].