Fast Rate Analysis of Some Stochastic Optimization Algorithms
Authors: Chao Qu, Huan Xu, Chong Ong
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we perform numerical simulations to illustrate and validate our results. We emphasize that this paper concerns providing tighter theoretical analysis of three well-known algorithms, rather than proposing new algorithms. As such, the goal of the simulation is not about showing the superiority of the algorithms (which has been validated in many real applications (Ouyang et al., 2013; Suzuki, 2013; Xiao, 2009), but to show that their empirical performance matches our theorem. In light of this, synthesized data is more preferable in our experiments, as we are able to compute the global optimum (and hence the optimality gap) based on the underlying mechanism that generates the data, which is typically impossible for real data sets. We use LASSO as the problem to optimize, i.e. f(w, (x, y)) + r(w) = 1 2 y w T x 2 2 + λ w 1, as this is a widely used formulation in various areas such as sparse coding, compressive sensing, and high dimensional statistics. We generate the input and output pair (x, y) in the following way: w is a k sparse vector of dimension d where the no-zeros entries are generated from the standard normal distribution, x is a d dimension vector drawn from the standard normal distribution, and y = (w )T x+ξ, where ξ is a Gaussian noise term with variance σ2 = 0.25. Observe this satisfies our setting where f(w, (x, y)) + r(w) is a weakly convex function, but E(f(w, (x, y))) is strongly convex. In Lasso, given all conditions above, we can calculate G(w) analytically as follows. G(w) = 1 2 w w 2 2 + λ w 1 + σ 2 2 . We set λ = 0.1. The optimal solution of G(w) can be calculated by the standard soft thresholding operation of w . The simulation results are reported in Figure 1 and 2. The Y axis is the optimality gap G( w T ) G(w ) and the X axis is the number of steps. All results are averaged over 10 trials and drawn in the log-log scale. |
| Researcher Affiliation | Academia | Chao Qu A0117143@U.NUS.EDU Department of Mechanical Engineering, National University of Singapore Huan Xu ISEXUH@NUS.EDU.SG Department of Industrial and Systems Engineering, National University of Singapore Chong Jin Ong MPEONGCJ@NUS.EDU.SG Department of Mechanical Engineering , National University of Singapore |
| Pseudocode | No | The paper describes iterative update rules for algorithms using mathematical equations, but does not present them in a formally labeled 'Algorithm' or 'Pseudocode' block. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the methodology is openly available. |
| Open Datasets | No | The paper uses synthesized data: 'synthesized data is more preferable in our experiments, as we are able to compute the global optimum (and hence the optimality gap) based on the underlying mechanism that generates the data, which is typically impossible for real data sets. We use LASSO as the problem to optimize... We generate the input and output pair (x, y) in the following way: w is a k sparse vector of dimension d where the no-zeros entries are generated from the standard normal distribution, x is a d dimension vector drawn from the standard normal distribution, and y = (w )T x+ξ, where ξ is a Gaussian noise term with variance σ2 = 0.25.' |
| Dataset Splits | No | The paper uses synthesized data for simulation and evaluates performance based on optimality gap. It does not mention or specify any train/validation/test dataset splits. |
| Hardware Specification | No | The paper does not mention any specific hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers that are dependencies for replicating the experiments. |
| Experiment Setup | Yes | We set λ = 0.1. The simulation results are reported in Figure 1 and 2. The Y axis is the optimality gap G( w T ) G(w ) and the X axis is the number of steps. All results are averaged over 10 trials and drawn in the log-log scale. |