Online Convex Optimization with Unconstrained Domains and Losses
Authors: Ashok Cutkosky, Kwabena A. Boahen
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an online convex optimization algorithm (RESCALEDEXP) that achieves optimal regret in the unconstrained setting without prior knowledge of any bounds on the loss functions. We prove a lower bound showing an exponential separation between the regret of existing algorithms that require a known bound on the loss functions and any algorithm that does not require such knowledge. RESCALEDEXP matches this lower bound asymptotically in the number of iterations. RESCALEDEXP is naturally hyperparameter-free and we demonstrate empirically that it matches prior optimization algorithms that require hyperparameter optimization. To validate our theoretical results in practice, we evaluated RESCALEDEXP on 8 classification datasets. |
| Researcher Affiliation | Academia | Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford University boahen@stanford.edu |
| Pseudocode | Yes | Algorithm 1 RESCALEDEXP Initialize: k 2, M0 0, w1 0, t 1 // t is the start-time of the current epoch. for t = 1 to T do Play wt, receive subgradient gt ℓt(wt). if t = 1 then L1 g1 p 1/L1 end if Mt max(Mt 1, gt :t /p g 2 t :t). ηt 1 k 2(Mt+ g 2 t :t) //Set wt+1 using FTRL update wt+1 gt :t gt :t [exp(ηt gt :t ) 1] // = argminw h ψ(w) ηt + gt :tw i if gt > 2Lt then //Begin a new epoch: update L and restart FTRL Lt+1 gt p 1/Lt+1 t t + 1 Mt 0 wt+1 0 else Lt+1 Lt end if end for |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | To validate our theoretical results in practice, we evaluated RESCALEDEXP on 8 classification datasets. The data for each task was pulled from the libsvm website [15], and can be found individually in a variety of sources [16, 17, 18, 19, 20, 21, 22]. We consider the MNIST [18] and CIFAR-10 [27] image classification tasks. |
| Dataset Splits | No | The paper mentions 'final validation accuracy' but does not specify the dataset splits (e.g., percentages or counts) or reference predefined splits to reproduce the partitioning. |
| Hardware Specification | No | No specific hardware details (such as GPU or CPU models, or memory specifications) used for running the experiments were provided. |
| Software Dependencies | No | The paper mentions various optimization algorithms like ADAGRAD, ADAM, and ADADELTA, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We use linear classifiers with hinge-loss for each task and we compare RESCALEDEXP to five other optimization algorithms: ADAGRAD [5], SCALEINVARIANT [23], PISTOL [24], ADAM [25], and ADADELTA [26]. Our MNIST architecture consisted of two consecutive 5 5 convolution and 2 2 max-pooling layers followed by a 512-neuron fully-connected layer. Our CIFAR-10 architecture was two consecutive 5 5 convolution and 3 3 max-pooling layers followed by a 384-neuron fully-connected layer and a 192-neuron fully-connected layer. In order to achieve this performance, we made a slight modification to RESCALEDEXP: when we update Lt, instead of resetting wt to zero, we re-center the algorithm about the previous prediction point. |