Stochastic Optimization Schemes for Performative Prediction with Nonconvex Loss

Authors: Qiang LI, Hoi-To Wai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments corroborate our theories. We consider two examples of performative prediction with non-convex loss based on synthetic data and real data. All simulations are performed with Pytorch on a server using a Intel Xeon 6318 CPU.
Researcher Affiliation Academia Qiang Li Hoi-To Wai Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong, Shatin, Hong Kong SAR of China {liqiang, htwai}@se.cuhk.edu.hk
Pseudocode No The paper describes algorithms verbally and with mathematical equations (e.g., equation (2) for SGD-GD, equation (23) for lazy deployment), but does not include explicit pseudocode or algorithm blocks labeled as such.
Open Source Code Yes The paper will provide open access to the source code, ensuring that the main experimental results can be faithfully reproduced.
Open Datasets Yes Our second example deals with the task of training a neural network (NN) on the spambase Hopkins et al. [1999] dataset with m = 4601 samples, each with d = 48 features.
Dataset Splits No We split the training/test sets as 8 : 2.
Hardware Specification Yes All simulations are performed with Pytorch on a server using a Intel Xeon 6318 CPU.
Software Dependencies No All simulations are performed with Pytorch on a server using a Intel Xeon 6318 CPU.
Experiment Setup Yes For (2), the batch size is b = 1 and the stepsize is γt = γ = 1/√T with T = 10^6. In our experiment, we set ϵNN ∈ {0, 10, 100}, batch size as b = 8. For SGD-GD, we use γt = γ = 200/√T and for lazy deployment, we use γ = 200/(K√T) with T = 10^5. The NN encoded in fθ(x) consists of three fully-connected layers with tanh activation and a sigmoid output layer, i.e., fθ(x) = Sigmoid(θ(1)tanh(θ(2)tanh(θ(3)x))) , where θ(i) := [w(i); b(i)] concatenates the weight and bias for each layer with d1 = 10, d2 = 50, d3 = 57 neurons, making a total of d = 3421 parameters for θ.