Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ChaCha for Online AutoML

Authors: Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that Cha Cha provides good performance across a wide array of datasets when optimizing over featurization and hyperparameter decisions. We test the Cha Cha algorithm on a suite of large regression datasets from Open ML (Vanschoren et al., 2014) for two online auto ML tasks. Figure 1 shows a demonstrative result obtained by Cha Cha for tuning features interactions choices, eclipsing a widely used online learning algorithm. Further experimentation demonstrates Cha Cha is consistently near-best amongst plausible alternatives.
Researcher Affiliation Collaboration 1Microsoft Research. Correspondence to: Qingyun Wu <EMAIL>, John Langford <EMAIL>.
Pseudocode Yes Algorithm 1 Cha Cha; Algorithm 2 Schedule(b, B, S); Algorithm 3 Choose(S)
Open Source Code Yes Our method is open-sourced in the Auto ML Libriary FLAML2. Please find a demonstration of usage in this notebook3. 2https://github.com/microsoft/FLAML/tree/main/flaml/onlineml 3https://github.com/microsoft/FLAML/blob/main/notebook/flaml_autovw.ipynb
Open Datasets Yes We evaluate our method on a set of large scale (# of instance: 10K to 1M) regression datasets from Open ML (in total 40). All the datasets are publicly available in Open ML4. 4https://www.openml.org/search?type=data
Dataset Splits No The paper uses 'progressive validation loss' as an evaluation metric in an online learning setting, but it does not specify traditional train/validation/test dataset splits (e.g., percentages or sample counts) as is common in batch learning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using 'Vowpal Wabbit' for evaluation but does not specify a version number. No other software dependencies with version numbers are listed.
Experiment Setup Yes We perform the main evaluation under the constraint that a maximum of 5 live learners are allowed, i.e., b = 5. We use the default configuration in VW as the the initial configuration cinit: no feature interactions, and the learning rate is 0.5. We use the VW default learning algorithm (which uses a variant of online gradient descent) as the base learner. for all the experiments, we run each method 5 times with different settings of random seed