Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits

Authors: Arya Akhavan, Massimiliano Pontil, Alexandre Tsybakov

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of zero-order optimization of a strongly convex function. The goal is to find the minimizer of the function by a sequential exploration of its values, under measurement noise. We study the impact of higher order smoothness properties of the function on the optimization error and on the cumulative regret. To solve this problem we consider a randomized approximation of the projected gradient descent algorithm. The gradient is estimated by a randomized procedure involving two function evaluations and a smoothing kernel. We derive upper bounds for this algorithm both in the constrained and unconstrained settings and prove minimax lower bounds for any sequential search method. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters. To handle task (a), we prove upper bounds for Algorithm 1, and to handle (b), we prove minimax lower bounds for any sequential search method.
Researcher Affiliation Academia Arya Akhavan Istituto Italiano di Tecnologia and ENSAE Paris Tech aria.akhavanfoomani@iit.it Massimiliano Pontil Istituto Italiano di Tecnologia and University College London massimiliano.pontil@iit.it Alexandre B. Tsybakov ENSAE Paris Tech alexandre.tsybakov@ensae.fr
Pseudocode Yes Algorithm 1 Zero-Order Stochastic Projected Gradient Requires Kernel K : [ 1, 1] R, step size ηt > 0 and parameter ht, for t = 1, . . . , T Initialization Generate scalars r1, . . . , r T uniformly on the interval [ 1, 1], vectors ζ1, . . . , ζT uniformly distributed on the unit sphere Sd = {ζ Rd : ζ = 1}, and choose x1 Θ For t = 1, . . . , T 1. Let yt = f(xt + htrtζt) + ξt and y t = f(xt htrtζt) + ξ t, 2. Define ˆgt = d 2ht (yt y t)ζt K(rt) 3. Update xt+1 = ProjΘ(xt ηtˆgt) Return (xt)T t=1
Open Source Code No The paper does not contain any statement about making the source code available, nor does it provide a link to a code repository.
Open Datasets No This is a theoretical paper that does not perform empirical evaluations on datasets, therefore no training dataset information is provided.
Dataset Splits No This is a theoretical paper that does not perform empirical evaluations, therefore no dataset split information (training, validation, or test) is provided.
Hardware Specification No This is a theoretical paper, and as such, it does not describe any specific hardware used for experiments.
Software Dependencies No This is a theoretical paper and does not mention any specific software dependencies with version numbers for reproducibility.
Experiment Setup No This is a theoretical paper focused on deriving mathematical bounds, and therefore does not include details on experimental setup such as hyperparameters or training configurations.