Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ChaCha for Online AutoML
Authors: Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that Cha Cha provides good performance across a wide array of datasets when optimizing over featurization and hyperparameter decisions. We test the Cha Cha algorithm on a suite of large regression datasets from Open ML (Vanschoren et al., 2014) for two online auto ML tasks. Figure 1 shows a demonstrative result obtained by Cha Cha for tuning features interactions choices, eclipsing a widely used online learning algorithm. Further experimentation demonstrates Cha Cha is consistently near-best amongst plausible alternatives. |
| Researcher Affiliation | Collaboration | 1Microsoft Research. Correspondence to: Qingyun Wu <EMAIL>, John Langford <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Cha Cha; Algorithm 2 Schedule(b, B, S); Algorithm 3 Choose(S) |
| Open Source Code | Yes | Our method is open-sourced in the Auto ML Libriary FLAML2. Please find a demonstration of usage in this notebook3. 2https://github.com/microsoft/FLAML/tree/main/flaml/onlineml 3https://github.com/microsoft/FLAML/blob/main/notebook/flaml_autovw.ipynb |
| Open Datasets | Yes | We evaluate our method on a set of large scale (# of instance: 10K to 1M) regression datasets from Open ML (in total 40). All the datasets are publicly available in Open ML4. 4https://www.openml.org/search?type=data |
| Dataset Splits | No | The paper uses 'progressive validation loss' as an evaluation metric in an online learning setting, but it does not specify traditional train/validation/test dataset splits (e.g., percentages or sample counts) as is common in batch learning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Vowpal Wabbit' for evaluation but does not specify a version number. No other software dependencies with version numbers are listed. |
| Experiment Setup | Yes | We perform the main evaluation under the constraint that a maximum of 5 live learners are allowed, i.e., b = 5. We use the default configuration in VW as the the initial configuration cinit: no feature interactions, and the learning rate is 0.5. We use the VW default learning algorithm (which uses a variant of online gradient descent) as the base learner. for all the experiments, we run each method 5 times with different settings of random seed |