Adaptive Scale-Invariant Online Algorithms for Learning Linear Models

Authors: Michal Kempka, Wojciech Kotlowski, Manfred K. Warmuth

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test both algorithms in a computational study on several real-life data sets and show that without any need to tune parameters, they are competitive to popular online learning methods, which are allowed to tune their learning rates to optimize the test set performance. To further check empirical performance of our algorithms we tested them on some popular real-life benchmark datasets.
Researcher Affiliation Collaboration 1Poznan University of Technology, Poznan, Poland 2Google Inc. Zürich & UC Santa Cruz.
Pseudocode Yes Algorithm 1: Sc In OL1(ϵ = 1) and Algorithm 2: Sc In OL2(ϵ = 1) are explicitly labeled algorithm blocks with structured steps.
Open Source Code No We implemented our algorithms in Tensorflow and used existing implementations whenever it was possible. No statement about open-sourcing *their* code is present.
Open Datasets Yes We chose 5 datasets with varying levels of feature scale variance from UCI repository (Dheeru & Karra Taniskidou, 2017) (Covertype, Census, Shuttle, Bank, Madelon) and a popular benchmark dataset MNIST (Le Cun & Cortes, 2010).
Dataset Splits No Datasets that do not provide separate testing sets were split randomly into training/test sets (2/1 ratio). The paper explicitly states a train/test split but does not mention a separate validation set or how it was used if implicitly present.
Hardware Specification No The paper mentions implementing algorithms in TensorFlow but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud infrastructure) used for running the experiments.
Software Dependencies No We implemented our algorithms in Tensorflow and used existing implementations whenever it was possible. It mentions software names but no specific version numbers for reproducibility.
Experiment Setup Yes All algorithms except ours have a learning rate parameter, which in each case was set to values from {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10} (results concerning all learning rates were reported). The algorithms using hand-picked learning rates (SGD, Ada Grad, Adam, NAG) were run with values from {0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0} (all other parameters were kept default).