Stochastic Optimization Schemes for Performative Prediction with Nonconvex Loss
Authors: Qiang LI, Hoi-To Wai
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments corroborate our theories. We consider two examples of performative prediction with non-convex loss based on synthetic data and real data. All simulations are performed with Pytorch on a server using a Intel Xeon 6318 CPU. |
| Researcher Affiliation | Academia | Qiang Li Hoi-To Wai Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong, Shatin, Hong Kong SAR of China {liqiang, htwai}@se.cuhk.edu.hk |
| Pseudocode | No | The paper describes algorithms verbally and with mathematical equations (e.g., equation (2) for SGD-GD, equation (23) for lazy deployment), but does not include explicit pseudocode or algorithm blocks labeled as such. |
| Open Source Code | Yes | The paper will provide open access to the source code, ensuring that the main experimental results can be faithfully reproduced. |
| Open Datasets | Yes | Our second example deals with the task of training a neural network (NN) on the spambase Hopkins et al. [1999] dataset with m = 4601 samples, each with d = 48 features. |
| Dataset Splits | No | We split the training/test sets as 8 : 2. |
| Hardware Specification | Yes | All simulations are performed with Pytorch on a server using a Intel Xeon 6318 CPU. |
| Software Dependencies | No | All simulations are performed with Pytorch on a server using a Intel Xeon 6318 CPU. |
| Experiment Setup | Yes | For (2), the batch size is b = 1 and the stepsize is γt = γ = 1/√T with T = 10^6. In our experiment, we set ϵNN ∈ {0, 10, 100}, batch size as b = 8. For SGD-GD, we use γt = γ = 200/√T and for lazy deployment, we use γ = 200/(K√T) with T = 10^5. The NN encoded in fθ(x) consists of three fully-connected layers with tanh activation and a sigmoid output layer, i.e., fθ(x) = Sigmoid(θ(1)tanh(θ(2)tanh(θ(3)x))) , where θ(i) := [w(i); b(i)] concatenates the weight and bias for each layer with d1 = 10, d2 = 50, d3 = 57 neurons, making a total of d = 3421 parameters for θ. |