reproducibilityindex.ai

Better Parameter-Free Stochastic Optimization with ODE Updates for Coin-Betting

Authors: Keyi Chen, John Langford, Francesco Orabona6239-6247

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that this new parameter-free algorithm outperforms algorithms with the best default learning rates and almost matches the performance of ﬁnely tuned baselines without anything to tune.4 Empirical Evaluation Here, we compare CODE with SGD, SGD with truncated models (a Prox) (Asi and Duchi 2019), SGD with Importance Weight Aware updates (IWA) (Karampatziakis and Langford 2011), Ada Grad (Duchi, Hazan, and Singer 2011), Adam (Kingma and Ba 2015), the coin-betting algorithm in (2) (Coin) (Orabona and Pal 2016) and the recursive coinbetting algorithm (Recursive) (Cutkosky and Sarlos 2019). We test the ability of CODE to get a good generalization error. Hence, we perform experiments with 21 different machine learning binary classiﬁcation datasets and 17 regression datasets from the LIBSVM website (Chang and Lin 2011) and Open ML(Vanschoren et al. 2013).
Researcher Affiliation	Collaboration	Keyi Chen1, John Langford2, Francesco Orabona1 1 Boston University, Boston, MA 2 Microsoft Research, New York, NY keyichen@bu.edu, jcl@microsoft.com, francesco@orabona.com
Pseudocode	Yes	Algorithm 1: Coin-betting ODE (CODE) Algorithm 1: Initialize: Wealth0 = 1, H1 = 1, θ1 = 0 Rd 2: for t = 1, . . . , T do 3: Query point xt = Wealtht Ht θt 4: Receive gt such that E[gt] F(xt), gt 1 5: Calculate ht = min(1, ht), where ht is the zero of the function φ in (7) 6: Update Wealtht+1 = Wealtht e gt,θt ln(1+ ht Ht )+ gt 2(ht+Ht ln Ht Ht+ht ) 7: Update Ht+1 = Ht + ht 8: Update θt+1 = θt htgt 9: end for
Open Source Code	No	The paper does not provide a direct link to open-source code for the described methodology, nor does it explicitly state that the code is released or available in supplementary materials.
Open Datasets	Yes	We pre-process the samples normalizing them to unit norm vectors. We shufﬂe the data and use 70% for training, 15% for validation, and hold out 15% for testing. We perform experiments with 21 different machine learning binary classiﬁcation datasets and 17 regression datasets from the LIBSVM website (Chang and Lin 2011) and Open ML(Vanschoren et al. 2013).
Dataset Splits	Yes	We shufﬂe the data and use 70% for training, 15% for validation, and hold out 15% for testing.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup	Yes	For SGD, a Prox and IWA, we use the optimal worst-case step size for stochastic convex optimization: ηk = η0/ k, and tune the initial step size η0. In the adaptive learning rate methods, Ada Grad and Adam, we tune the initial step size η0. For each repetition and dataset, we use the validation set to select the best learning rate, train using that learning rate, test on the test set and report the average of normalized loss.