Boosting with Multiple Sources
Authors: Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also report the results of several experiments with our algorithm demonstrating that it outperforms natural baselines on multi-source text-based, image-based and tabular data. We further present an extension of our algorithm to the federated learning scenario and report favorable experimental results for that setting as well. |
| Researcher Affiliation | Collaboration | Corinna Cortes Google Research New York, NY 10011 corinna@google.com Mehryar Mohri Google & Courant Institute New York, NY 10012 mohri@google.com Dmitry Storcheus Courant Institute & Google New York, NY 10012 dstorcheus@google.com Ananda Theertha Suresh Google Research New York, NY 10011 theertha@google.com |
| Pseudocode | Yes | The pseudocode of our algorithm, MULTIBOOST, is provided in Figure 1. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Datasets and preprocessing steps used are described below with additional dataset details provided in Appendix H. Note, that all datasets are public and do not contain any personal identifiers or offensive information. |
| Dataset Splits | Yes | The errors and their standard deviations are reported based on 10-fold cross validation. Each source Sk is independently split into 10 folds S1 k, . . . , S10 k . For the i-th cross-validation step, the test set is {Si 1, . . . , Si p}, while the rest is used for training. |
| Hardware Specification | Yes | The experiments were performed on Linux and Mac workstations with Quad-Core Intel Core i7 2.9 GHz and Intel Xeon 2.20 GHz respectively. |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software or libraries used in the experiments. |
| Experiment Setup | Yes | Our study is restricted to learning an ensemble of decision stumps Hstumps using the exponential surrogate loss Φ(u) = e u. ... We used T = 100 boosting steps for all benchmarks. ... To estimate the probabilities Q(k| ) for k [p], we assigned the label k to each sample from domain k and used multinomial logistic regression. ... Alternatively, for some experiments we used line search with 100 steps. |