Continuized Acceleration for Quasar Convex Functions in Non-Convex Optimization

Authors: Jun-Kun Wang, Andre Wibisono

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare the proposed continuized acceleration with GD and the accelerated method of Hinder et al. (2020) (AGD). For the method of Hinder et al. (2020), we use their implementation available online (Hinder et al., 2021).
Researcher Affiliation Academia Jun-Kun Wang and Andre Wibisono Department of Computer Science, Yale University {jun-kun.wang,andre.wibisono}@yale.edu
Pseudocode Yes APPENDIX A ALGORITHMS OF HINDER ET AL. (2020) We replicate the algorithms in Hinder et al. (2020) using our notations for the reader s reference. Their algorithms use a subroutine of binary search to determine the mixing parameter τk. Algorithm 1: AGD for (ρ, µ)-strongly quasar convex function minimization in Hinder et al. (2020) (...) Algorithm 3: BINARYLINESEARCH(f, w, z, b, c, ϵ,[guess]) (Hinder et al., 2020)
Open Source Code No The paper mentions open-source code for a baseline method (Hinder et al., 2020) that they compare against, but does not state that their own proposed method's code is available.
Open Datasets No The data used in the experiments is synthetic, generated by sampling from a normal distribution and a specified link function: 'Each data point xi is sampled from the normal distribution N(0, Id) and the label yi is generated as yi = σ(w x), where w N(0, Id) is the true vector and σ( ) is the link function.'
Dataset Splits No The paper describes the generation of synthetic data and its size ('n = 1000 and the dimension d = 50') but does not specify any train/validation/test dataset splits.
Hardware Specification No The paper mentions 'CPU time' as a performance metric but does not provide any specific hardware details such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper does not provide specific version numbers for any software libraries or dependencies used for the experiments.
Experiment Setup Yes In the experiments, we set the number of samples n = 1000 and the dimension d = 50. The initial point of all the algorithms w0 Rd is a close-to-zero point, and is sampled as w0 10 2ζ, where ζ N(0, Id). (...) We instead use the grid search and report the result under the best configuration of these parameters for each method. More precisely, we search L and µ over {. . . , 10q, 5 10q, 10q+1, . . . } with the constraint that L > µ, where q { 2, 1, . . . , 4}, and search ρ {0.01, 0.1, 0.5}.