Assisted Learning: A Framework for Multi-Organization Learning
Authors: Xun Xian, Xinran Wang, Jie Ding, Reza Ghanadan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Theoretical and experimental studies, including real-world medical benchmarks, show that Assisted Learning can often achieve near-oracle learning performance as if data and training processes were centralized.We provide numerical demonstrations of the proposed methods in Section 4.2 and 4.3. For synthetic data, we replicate 20 times for each method. In each replication, we trained on a dataset with size 10^4 then tested on a dataset with size 10^5. We chose a testing size much larger than the training size in order to produce a fair comparison of out-sample predictive performance [17]. For the real data, we trained on 70% of the whole data and tested on the remaining, resampled 20 times. |
| Researcher Affiliation | Collaboration | Xun Xian School of Statistics University of Minnesota xian0044@umn.edu Xinran Wang School of Statistics University of Minnesota wang8740@umn.edu Jie Ding School of Statistics University of Minnesota dingj@umn.edu Reza Ghanadan Google Research rezaghanadan@google.com |
| Pseudocode | Yes | Procedure 1 Assisted Learning of Module Alice with m other modules (general description) and Procedure 2 Assisted Learning of Module Alice ( a ) using Module Bob ( b ) for neural networks |
| Open Source Code | No | The paper provides a link to a project website (http://www.assisted-learning.org), but this website is a general project overview and does not explicitly or directly provide access to the source code for the methodology described in the paper. There is no statement like 'We release our code' or a direct link to a code repository. |
| Open Datasets | Yes | Medical Information Mart for Intensive Care III [36] (MIMIC3) is a comprehensive clinical database... MIMIC3 Benchmarks [37,38] consist of essential medical machine learning tasks... We use the data generated by Friedman1 [58]. |
| Dataset Splits | Yes | For synthetic data, we replicate 20 times for each method. In each replication, we trained on a dataset with size 10^4 then tested on a dataset with size 10^5. and For the real data, we trained on 70% of the whole data and tested on the remaining, resampled 20 times. and The above procedure of iterative assistance is repeated K times until the cross-validation error of Alice no longer decreases. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing instances. |
| Software Dependencies | No | The paper mentions various machine learning models and refers to some libraries (e.g., Xgboost, Lightgbm) but does not provide specific version numbers for any software dependencies or programming languages used in the experimental setup. |
| Experiment Setup | No | While the paper describes the models used and the number of rounds of assistance, it does not provide specific hyperparameter values such as learning rates, batch sizes, or optimizer configurations, or other detailed system-level training settings in the main text. |