Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Boosting with Multiple Sources
Authors: Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also report the results of several experiments with our algorithm demonstrating that it outperforms natural baselines on multi-source text-based, image-based and tabular data. We further present an extension of our algorithm to the federated learning scenario and report favorable experimental results for that setting as well. |
| Researcher Affiliation | Collaboration | Corinna Cortes Google Research New York, NY 10011 EMAIL Mehryar Mohri Google & Courant Institute New York, NY 10012 EMAIL Dmitry Storcheus Courant Institute & Google New York, NY 10012 EMAIL Ananda Theertha Suresh Google Research New York, NY 10011 EMAIL |
| Pseudocode | Yes | The pseudocode of our algorithm, MULTIBOOST, is provided in Figure 1. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Datasets and preprocessing steps used are described below with additional dataset details provided in Appendix H. Note, that all datasets are public and do not contain any personal identifiers or offensive information. |
| Dataset Splits | Yes | The errors and their standard deviations are reported based on 10-fold cross validation. Each source Sk is independently split into 10 folds S1 k, . . . , S10 k . For the i-th cross-validation step, the test set is {Si 1, . . . , Si p}, while the rest is used for training. |
| Hardware Specification | Yes | The experiments were performed on Linux and Mac workstations with Quad-Core Intel Core i7 2.9 GHz and Intel Xeon 2.20 GHz respectively. |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software or libraries used in the experiments. |
| Experiment Setup | Yes | Our study is restricted to learning an ensemble of decision stumps Hstumps using the exponential surrogate loss Φ(u) = e u. ... We used T = 100 boosting steps for all benchmarks. ... To estimate the probabilities Q(k| ) for k [p], we assigned the label k to each sample from domain k and used multinomial logistic regression. ... Alternatively, for some experiments we used line search with 100 steps. |