reproducibilityindex.ai

Sparsity-Preserving Differentially Private Training of Large Embedding Models

Authors: Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we evaluate the performance of our sparsity-preserving training algorithms and compare them against vanilla DP-SGD on both the recommendation and language understanding tasks (Section 4.1).
Researcher Affiliation	Collaboration	Badih Ghazi Google Research Mountain View, CA badihghazi@gmail.com Yangsibo Huang Princeton University Princeton, NJ yangsibo@princeton.edu
Pseudocode	Yes	Algorithm 1: DP-Ada FEST: Adaptive Filtering Applied Sparse Training.
Open Source Code	No	The paper references 'Google s DP library' and provides a link to its GitHub repository for privacy accounting, but it does not state that the code for the methodology described in this paper is open-source or publicly available.
Open Datasets	Yes	We evaluate our algorithms on the widely-used Criteo predicted click-through rate (p CTR) dataset4, which includes over four billion ad impressions over 24 days. ... 4https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset ... For language understanding tasks, we employ language models from the BERT family [DCLT19]... We fine-tune the RoBERTa model for downstream classification tasks from the GLUE benchmark [WSM+19], including SST-2 [SPW+13], QNLI [RZLL16], and QQP [IYW+17].
Dataset Splits	Yes	To simulate a real-world online training scenario, we train the model on the first 18 days of data and evaluate it on subsequent days (i.e., days 19 24);
Hardware Specification	No	The paper mentions 'Google TPUs' as a motivation for sparsity but does not provide specific hardware details (like exact GPU/CPU models, processor types, or TPU versions) used for running its main experiments or simulations.
Software Dependencies	No	The paper mentions software like 'Tensorflow' and 'tf.keras.layers.Embedding' but does not provide specific version numbers for these or any other ancillary software components used in the experiments.
Experiment Setup	Yes	For DP-SGD, we fine-tune the clipping norm and report the best accuracy achieved. When evaluating DP-FEST, we adjust the hyper-parameter k, which represents the number of preserved top buckets, with values ranging from 100 to 300,000. Regarding DP-Ada FEST, we tune the following hyper-parameters: The ratio of noise added to the contribution map to the one added to the sparse gradient, σ1/σ2, with options of 0.1 to 10. The thresholding value τ {0.5, 1.0, 5.0, 10.0, 20.0, 50.0, 100.0}. The clipping norm for gradient contribution C1 {1.0, 5.0, 10.0}.