Boosting with Multiple Sources

Authors: Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also report the results of several experiments with our algorithm demonstrating that it outperforms natural baselines on multi-source text-based, image-based and tabular data. We further present an extension of our algorithm to the federated learning scenario and report favorable experimental results for that setting as well.
Researcher Affiliation Collaboration Corinna Cortes Google Research New York, NY 10011 corinna@google.com Mehryar Mohri Google & Courant Institute New York, NY 10012 mohri@google.com Dmitry Storcheus Courant Institute & Google New York, NY 10012 dstorcheus@google.com Ananda Theertha Suresh Google Research New York, NY 10011 theertha@google.com
Pseudocode Yes The pseudocode of our algorithm, MULTIBOOST, is provided in Figure 1.
Open Source Code No The paper does not provide any explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes Datasets and preprocessing steps used are described below with additional dataset details provided in Appendix H. Note, that all datasets are public and do not contain any personal identifiers or offensive information.
Dataset Splits Yes The errors and their standard deviations are reported based on 10-fold cross validation. Each source Sk is independently split into 10 folds S1 k, . . . , S10 k . For the i-th cross-validation step, the test set is {Si 1, . . . , Si p}, while the rest is used for training.
Hardware Specification Yes The experiments were performed on Linux and Mac workstations with Quad-Core Intel Core i7 2.9 GHz and Intel Xeon 2.20 GHz respectively.
Software Dependencies No The paper does not provide specific version numbers for ancillary software or libraries used in the experiments.
Experiment Setup Yes Our study is restricted to learning an ensemble of decision stumps Hstumps using the exponential surrogate loss Φ(u) = e u. ... We used T = 100 boosting steps for all benchmarks. ... To estimate the probabilities Q(k| ) for k [p], we assigned the label k to each sample from domain k and used multinomial logistic regression. ... Alternatively, for some experiments we used line search with 100 steps.