Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Faster Rates of Differentially Private Stochastic Convex Optimization
Authors: Jinyan Su, Lijie Hu, Di Wang
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct experiments of our new methods on real-world data. Experimental results also provide new insights into established theories. 6. Experiments In this section, we provide experimental studies to compare the effectiveness of the proposed methods for several problems satisfying TNC. Experimental Settings For the instances satisfying TNC, here we study three examples that have been studied in the previous related work such as (Liu et al., 2018; Xu et al., 2017). ... Dataset and Parameter Settings We will implement all the above methods on four real-world datasets from the libsvm website3, namely a8a (n = 22, 696, d = 123 for training, and n = 9, 865 for testing), a9a (n = 32, 561, d = 123 for training, and n = 16, 281 for testing), ijcnn1 (n = 49, 990, d = 22 for training, and n = 91, 701 for testing), and w7a (n = 24, 692, d = 300 for training, and n = 25, 057 for testing). ... Experimental Results In Figure 1, we show the performance of Iterated SGD with different θ comparing with three baseline methods for ℓ2-norm regularized logistic regression. |
| Researcher Affiliation | Academia | Jinyan Su EMAIL Cornell University Lijie Hu EMAIL Provable Responsible AI and Data Analytics Lab Division of CEMSE King Abdullah University of Science and Technology Thuwal, Saudi Arabia Di Wang EMAIL Provable Responsible AI and Data Analytics Lab Division of CEMSE King Abdullah University of Science and Technology Thuwal, Saudi Arabia |
| Pseudocode | Yes | Algorithm 1 Phased-SGD(w0, η, n, W) algorithm (Feldman et al., 2020) Algorithm 2 Private Stochastic Approximation(w1, n, R0) Algorithm 3 Phased-SGD-SC(w0, γ, ϵ, δ, W) Algorithm 4 Private Stochastic Approximation-II(w0, n, W) Algorithm 5 Iterated Phased-SGD(w1, n, W, θ) Algorithm 6 Phased-ERM(w0, η, n, W) algorithm (Feldman et al., 2020) Algorithm 7 Epoch-DP-SGD(η1, n1, n, w0) Algorithm 8 Faster-DPSGD-SC |
| Open Source Code | No | The paper describes various algorithms and methods but does not provide any specific links to source code repositories or explicitly state that the code for this work is being released. |
| Open Datasets | Yes | We will implement all the above methods on four real-world datasets from the libsvm website3, namely a8a (n = 22, 696, d = 123 for training, and n = 9, 865 for testing), a9a (n = 32, 561, d = 123 for training, and n = 16, 281 for testing), ijcnn1 (n = 49, 990, d = 22 for training, and n = 91, 701 for testing), and w7a (n = 24, 692, d = 300 for training, and n = 25, 057 for testing). For each sample in each dataset, we preprocess it to make its feature vector satisfy x 1 1 so that the loss function will be Lipschitz for some constant. 3. https://www.csie.ntu.edu.tw/~cjlin/libsvm/ |
| Dataset Splits | No | Dataset and Parameter Settings We will implement all the above methods on four real-world datasets from the libsvm website3, namely a8a (n = 22, 696, d = 123 for training, and n = 9, 865 for testing), a9a (n = 32, 561, d = 123 for training, and n = 16, 281 for testing), ijcnn1 (n = 49, 990, d = 22 for training, and n = 91, 701 for testing), and w7a (n = 24, 692, d = 300 for training, and n = 25, 057 for testing). The paper specifies the number of training and testing samples for each dataset but does not provide details on the methodology used to create these splits (e.g., random seed, stratification, or specific predefined splits from the libsvm website itself beyond just numbers). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, or cloud computing specifications. |
| Software Dependencies | No | The paper discusses implementing methods but does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Our approach involves conducting hyperparameter tuning to yield optimal outcomes, and we will present the results based on the selected hyperparameters. ... Here we will set the parameter λ = 10 3. ... We will set δ = 1 n1.1 for all experiments. ... When performing the results for different privacy budgets ϵ, we will use n = 104 samples and choose ϵ = {0.5, 1, 1.5, 2} respectively. |