The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection

Authors: Shubhankar Mohapatra, Sajin Sasy, Xi He, Gautam Kamath, Om Thakkar7806-7813

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a comprehensive empirical evaluation of the proposed theoretical method... We empirically and theoretically demonstrate... We empirically show that the DPAdam optimizer... The new optimizer is compared with DPAdam and ADADP. For brevity, we show experiments on σ = 4 and others appear in the full version (Mohapatra et al. 2021). In Figure 5, we show the maximum and median accuracy curves for all the optimizers.
Researcher Affiliation Collaboration 1University of Waterloo 2Google
Pseudocode No The proof for Theorem 3 and the pseudo-code for DPAdam WOSM are provided in our full version (Mohapatra et al. 2021).
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets Yes We repeat the same experiment over the ENRON dataset and observe similar trends (Figures 2(c) and 2(d)). ... we evaluate this private optimizer over four diverse datasets and two learning models including logistic regression and a neural network with one 100 neurons hidden layer (TLNN).
Dataset Splits No The dataset has been partitioned into the training set and the validation set.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models or memory specifications.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for reproducibility.
Experiment Setup Yes The grids for each optimizer are shown in Table 1, where DPSGD has 40 candidates to tune over and DPAdam has 4 with fixed α = 0.001, β1 = 0.9, β2 = 0.999. ... For each dataset and model, we run DPAdam three times with hyperparameters (α, β1, β2) from the grids, α [0.001, 0.05, 0.01, 0.2, 0.5], β1, β2 [0.8, 0.85, 0.9, 0.95, 0.99.0.999]. ... we fix a constant lot size (L=250), and consider tuning over three different noise levels, σ [2, 4, 8], ... we also fix the clipping threshold C=0.5, and T=2500 iterations of training.