Adaptive Scale-Invariant Online Algorithms for Learning Linear Models
Authors: Michal Kempka, Wojciech Kotlowski, Manfred K. Warmuth
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test both algorithms in a computational study on several real-life data sets and show that without any need to tune parameters, they are competitive to popular online learning methods, which are allowed to tune their learning rates to optimize the test set performance. To further check empirical performance of our algorithms we tested them on some popular real-life benchmark datasets. |
| Researcher Affiliation | Collaboration | 1Poznan University of Technology, Poznan, Poland 2Google Inc. Zürich & UC Santa Cruz. |
| Pseudocode | Yes | Algorithm 1: Sc In OL1(ϵ = 1) and Algorithm 2: Sc In OL2(ϵ = 1) are explicitly labeled algorithm blocks with structured steps. |
| Open Source Code | No | We implemented our algorithms in Tensorflow and used existing implementations whenever it was possible. No statement about open-sourcing *their* code is present. |
| Open Datasets | Yes | We chose 5 datasets with varying levels of feature scale variance from UCI repository (Dheeru & Karra Taniskidou, 2017) (Covertype, Census, Shuttle, Bank, Madelon) and a popular benchmark dataset MNIST (Le Cun & Cortes, 2010). |
| Dataset Splits | No | Datasets that do not provide separate testing sets were split randomly into training/test sets (2/1 ratio). The paper explicitly states a train/test split but does not mention a separate validation set or how it was used if implicitly present. |
| Hardware Specification | No | The paper mentions implementing algorithms in TensorFlow but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud infrastructure) used for running the experiments. |
| Software Dependencies | No | We implemented our algorithms in Tensorflow and used existing implementations whenever it was possible. It mentions software names but no specific version numbers for reproducibility. |
| Experiment Setup | Yes | All algorithms except ours have a learning rate parameter, which in each case was set to values from {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10} (results concerning all learning rates were reported). The algorithms using hand-picked learning rates (SGD, Ada Grad, Adam, NAG) were run with values from {0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0} (all other parameters were kept default). |