Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Estimating Uncertainty Intervals from Collaborating Networks

Authors: Tianhui Zhou, Yitong Li, Yuan Wu, David Carlson

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, learning is straightforward and robust. We benchmark CN against several common approaches on two synthetic and six real-world datasets, including forecasting A1c values in diabetic patients from electronic health records, where uncertainty is critical. In the synthetic data, the proposed approach essentially matches ground truth. In the real-world datasets, CN improves results on many performance metrics, including log-likelihood estimates, mean absolute errors, coverage estimates, and prediction interval widths.
Researcher Affiliation	Academia	Tianhui Zhou EMAIL Department of Biostatistics and Bioinformatics Duke University Durham, NC 27705, USA; Yitong Li EMAIL Department of Electrical and Computer Engineering Duke University Durham, NC 27705, USA; Yuan Wu EMAIL Department of Biostatistics and Bioinformatics Duke University Durham, NC 27705, USA; David Carlson EMAIL Departments of Civil and Environmental Engineering, Biostatistics and Bioinformatics, Electrical and Computer Engineering, and Computer Science Duke University Durham, NC 27705, USA
Pseudocode	Yes	We describe the full learning strategy in Section 4.2 and provide pseudo-code in Algorithm 1.
Open Source Code	Yes	The code to reproduce the experiments is publicly available1. 1. https://github.com/thuizhou/Collaborating-Networks
Open Datasets	Yes	The ﬁrst four datasets are publicly available UCI datasets with relatively small sample size2. They are Computer Hardware Data Set (CPU), Individual household electric power consumption Data Set (Energy), Auto MPG Data Set (MPG), Communities and Crime Data Set (Crime)... 2. http://archive.ics.uci.edu/ml/datasets. The ﬁfth is a publicly available Kaggle dataset3 which tracks the delay of domestic ﬂights by large airline carriers (Airline). 3. https://www.kaggle.com/usdot/flight-delays.
Dataset Splits	Yes	Training and evaluation follows a 0.6/0.4 split.
Hardware Specification	Yes	When we trained the networks with a single NVIDIA P100 GPU, the pre-training process ran at 483 it/s, and updates of g and f in the joint learning process ran at 152 it/s with batch size of 128 and an input feature space size of less than 50.
Software Dependencies	Yes	The implementation of the exact Gaussian process regression is based on python package gpytorch, https://docs.gpytorch.ai/en/v1.1.1/examples/01_Exact_GPs/.
Experiment Setup	Yes	The learning rate for all methods is ﬁxed to be 1e-5 with ADAM optimizer (Kingma and Ba, 2014). We set the batch size to be equal to the sample size of 100 throughout all methods. The learning rate for g is 1e-4 and f is 5e-4. The batch size is set as 200 for all training sizes. For methods that rely on stochastic gradient descent (CN, DP, CDP, PPGPR, and EN), we set the batch size for CPU and MPG datasets to 64, and the rest of the datasets to 128. We train with 300 epochs in each experiment.