Stochastic Optimization for Non-convex Inf-Projection Problems

Authors: Yan Yan, Yi Xu, Lijun Zhang, Wang Xiaoyu, Tianbao Yang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to verify the efficacy of the inf-projection formulation and proposed stochastic algorithms in comparison to the stochastic algorithms for solving minmax formulation (9). We perform two experiments on four datasets, i.e., a9a, RCV1, covtype and URL from the libsvm website, whose number of examples are n = 32561, 581012, 697641 and 2396130, respectively (Table 2). For each dataset, we randomly sample 80% as training data and the rest as testing data.
Researcher Affiliation Collaboration 1University of Iowa 2DAMO Academy, Alibaba Group 3Nanjing University 4The Chinese University of Hong Kong (Shenzhen).
Pseudocode Yes Algorithm 1 MSPG, Algorithm 2 St-SPG, Algorithm 3 SPG
Open Source Code No The paper does not provide concrete access to source code for the methodology described, such as a specific repository link, an explicit code release statement, or code in supplementary materials.
Open Datasets Yes We perform two experiments on four datasets, i.e., a9a, RCV1, covtype and URL from the libsvm website, whose number of examples are n = 32561, 581012, 697641 and 2396130, respectively (Table 2).
Dataset Splits No The paper states, 'For each dataset, we randomly sample 80% as training data and the rest as testing data,' but does not provide specific details on a separate validation split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We tune hyper-parameters from a reasonable range, i.e., for St-SPG, λ {10 5:2}, γ, µ {10 3:3}. For BMD and BMD-eff, we tune step size ηP {10 8: 15} for updating P, step size ηθ {10 5:3} for updating θ, ρ {n 10 3:3} and fix δ = 10 5. For MSPG, we tune λ {10 5:2}, the step size parameter c in Proposition 1 from {10 5:2}. Hyper-parameters of PGSMD and PGSMD-eff including ηP , ηθ, ρ and δ are selected in the same range as in the first experiment. The weak convexity parameter ρwc are chosen from {10 5:5}.