Online Convex Optimization with Unconstrained Domains and Losses

Authors: Ashok Cutkosky, Kwabena A. Boahen

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an online convex optimization algorithm (RESCALEDEXP) that achieves optimal regret in the unconstrained setting without prior knowledge of any bounds on the loss functions. We prove a lower bound showing an exponential separation between the regret of existing algorithms that require a known bound on the loss functions and any algorithm that does not require such knowledge. RESCALEDEXP matches this lower bound asymptotically in the number of iterations. RESCALEDEXP is naturally hyperparameter-free and we demonstrate empirically that it matches prior optimization algorithms that require hyperparameter optimization. To validate our theoretical results in practice, we evaluated RESCALEDEXP on 8 classification datasets.
Researcher Affiliation Academia Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford University boahen@stanford.edu
Pseudocode Yes Algorithm 1 RESCALEDEXP Initialize: k 2, M0 0, w1 0, t 1 // t is the start-time of the current epoch. for t = 1 to T do Play wt, receive subgradient gt ℓt(wt). if t = 1 then L1 g1 p 1/L1 end if Mt max(Mt 1, gt :t /p g 2 t :t). ηt 1 k 2(Mt+ g 2 t :t) //Set wt+1 using FTRL update wt+1 gt :t gt :t [exp(ηt gt :t ) 1] // = argminw h ψ(w) ηt + gt :tw i if gt > 2Lt then //Begin a new epoch: update L and restart FTRL Lt+1 gt p 1/Lt+1 t t + 1 Mt 0 wt+1 0 else Lt+1 Lt end if end for
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes To validate our theoretical results in practice, we evaluated RESCALEDEXP on 8 classification datasets. The data for each task was pulled from the libsvm website [15], and can be found individually in a variety of sources [16, 17, 18, 19, 20, 21, 22]. We consider the MNIST [18] and CIFAR-10 [27] image classification tasks.
Dataset Splits No The paper mentions 'final validation accuracy' but does not specify the dataset splits (e.g., percentages or counts) or reference predefined splits to reproduce the partitioning.
Hardware Specification No No specific hardware details (such as GPU or CPU models, or memory specifications) used for running the experiments were provided.
Software Dependencies No The paper mentions various optimization algorithms like ADAGRAD, ADAM, and ADADELTA, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We use linear classifiers with hinge-loss for each task and we compare RESCALEDEXP to five other optimization algorithms: ADAGRAD [5], SCALEINVARIANT [23], PISTOL [24], ADAM [25], and ADADELTA [26]. Our MNIST architecture consisted of two consecutive 5 5 convolution and 2 2 max-pooling layers followed by a 512-neuron fully-connected layer. Our CIFAR-10 architecture was two consecutive 5 5 convolution and 3 3 max-pooling layers followed by a 384-neuron fully-connected layer and a 192-neuron fully-connected layer. In order to achieve this performance, we made a slight modification to RESCALEDEXP: when we update Lt, instead of resetting wt to zero, we re-center the algorithm about the previous prediction point.