Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization

Authors: Lesi Chen, Jing Xu, Luo Luo

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct the numerical experiments on nonconvex penalized SVM and black-box attack to show the empirical superiority of proposed GFM+.
Researcher Affiliation Academia 1School of Data Science, Fudan University, Shanghai, China 2Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
Pseudocode Yes Algorithm 1 GFM (x0, η, T)
Open Source Code No The paper does not explicitly state that the source code for their proposed method (GFM+) is publicly available or provide a link to a repository. It only mentions thanking 'Zhichao Huang for sharing the code for black-box attack on CNN', which is code used by them, not their own method's code release.
Open Datasets Yes We compare the proposed GFM+ with GFM (Lin et al., 2022) on LIBSVM datasets a9a , w8a , covtype , ijcnn1 , mushrooms and phishing (Chang & Lin, 2011).
Dataset Splits No The paper mentions training models and using test sets but does not specify the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) for the experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper describes the algorithms and models used (e.g., SGD, Le Net CNN) but does not provide specific software dependency details with version numbers (e.g., programming language versions, library versions, or framework versions).
Experiment Setup Yes We set δ = 0.001 and tune the stepsize η from {0.1, 0.01, 0.001} for the two algorithms. For GFM+, we tune both m and b in {1, 10, 100} and set b = mb by following our theory. We set θ = 4 and κ = 0.2 for our experiments. For both GFM and GFM+, we tune b from {500, 1000, 2000}. For GFM+, we additionally tune m from {10, 20, 50} and set b = b /m. For both the two algorithms, we tune the initial learning rate η in {0.5, 0.05, 0.005} and decay by 1/2 if there is no improvement in 10 iterations at one attack. We train the CNN with SGD by 100 epochs with stepsize starting with 0.1 and decaying by 1/2 every 20 epochs.