reproducibilityindex.ai

Assisted Learning: A Framework for Multi-Organization Learning

Authors: Xun Xian, Xinran Wang, Jie Ding, Reza Ghanadan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Theoretical and experimental studies, including real-world medical benchmarks, show that Assisted Learning can often achieve near-oracle learning performance as if data and training processes were centralized.We provide numerical demonstrations of the proposed methods in Section 4.2 and 4.3. For synthetic data, we replicate 20 times for each method. In each replication, we trained on a dataset with size 10^4 then tested on a dataset with size 10^5. We chose a testing size much larger than the training size in order to produce a fair comparison of out-sample predictive performance [17]. For the real data, we trained on 70% of the whole data and tested on the remaining, resampled 20 times.
Researcher Affiliation	Collaboration	Xun Xian School of Statistics University of Minnesota xian0044@umn.edu Xinran Wang School of Statistics University of Minnesota wang8740@umn.edu Jie Ding School of Statistics University of Minnesota dingj@umn.edu Reza Ghanadan Google Research rezaghanadan@google.com
Pseudocode	Yes	Procedure 1 Assisted Learning of Module Alice with m other modules (general description) and Procedure 2 Assisted Learning of Module Alice ( a ) using Module Bob ( b ) for neural networks
Open Source Code	No	The paper provides a link to a project website (http://www.assisted-learning.org), but this website is a general project overview and does not explicitly or directly provide access to the source code for the methodology described in the paper. There is no statement like 'We release our code' or a direct link to a code repository.
Open Datasets	Yes	Medical Information Mart for Intensive Care III [36] (MIMIC3) is a comprehensive clinical database... MIMIC3 Benchmarks [37,38] consist of essential medical machine learning tasks... We use the data generated by Friedman1 [58].
Dataset Splits	Yes	For synthetic data, we replicate 20 times for each method. In each replication, we trained on a dataset with size 10^4 then tested on a dataset with size 10^5. and For the real data, we trained on 70% of the whole data and tested on the remaining, resampled 20 times. and The above procedure of iterative assistance is repeated K times until the cross-validation error of Alice no longer decreases.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies	No	The paper mentions various machine learning models and refers to some libraries (e.g., Xgboost, Lightgbm) but does not provide specific version numbers for any software dependencies or programming languages used in the experimental setup.
Experiment Setup	No	While the paper describes the models used and the number of rounds of assistance, it does not provide specific hyperparameter values such as learning rates, batch sizes, or optimizer configurations, or other detailed system-level training settings in the main text.