Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Estimating Uncertainty Intervals from Collaborating Networks
Authors: Tianhui Zhou, Yitong Li, Yuan Wu, David Carlson
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, learning is straightforward and robust. We benchmark CN against several common approaches on two synthetic and six real-world datasets, including forecasting A1c values in diabetic patients from electronic health records, where uncertainty is critical. In the synthetic data, the proposed approach essentially matches ground truth. In the real-world datasets, CN improves results on many performance metrics, including log-likelihood estimates, mean absolute errors, coverage estimates, and prediction interval widths. |
| Researcher Affiliation | Academia | Tianhui Zhou EMAIL Department of Biostatistics and Bioinformatics Duke University Durham, NC 27705, USA; Yitong Li EMAIL Department of Electrical and Computer Engineering Duke University Durham, NC 27705, USA; Yuan Wu EMAIL Department of Biostatistics and Bioinformatics Duke University Durham, NC 27705, USA; David Carlson EMAIL Departments of Civil and Environmental Engineering, Biostatistics and Bioinformatics, Electrical and Computer Engineering, and Computer Science Duke University Durham, NC 27705, USA |
| Pseudocode | Yes | We describe the full learning strategy in Section 4.2 and provide pseudo-code in Algorithm 1. |
| Open Source Code | Yes | The code to reproduce the experiments is publicly available1. 1. https://github.com/thuizhou/Collaborating-Networks |
| Open Datasets | Yes | The first four datasets are publicly available UCI datasets with relatively small sample size2. They are Computer Hardware Data Set (CPU), Individual household electric power consumption Data Set (Energy), Auto MPG Data Set (MPG), Communities and Crime Data Set (Crime)... 2. http://archive.ics.uci.edu/ml/datasets. The fifth is a publicly available Kaggle dataset3 which tracks the delay of domestic flights by large airline carriers (Airline). 3. https://www.kaggle.com/usdot/flight-delays. |
| Dataset Splits | Yes | Training and evaluation follows a 0.6/0.4 split. |
| Hardware Specification | Yes | When we trained the networks with a single NVIDIA P100 GPU, the pre-training process ran at 483 it/s, and updates of g and f in the joint learning process ran at 152 it/s with batch size of 128 and an input feature space size of less than 50. |
| Software Dependencies | Yes | The implementation of the exact Gaussian process regression is based on python package gpytorch, https://docs.gpytorch.ai/en/v1.1.1/examples/01_Exact_GPs/. |
| Experiment Setup | Yes | The learning rate for all methods is fixed to be 1e-5 with ADAM optimizer (Kingma and Ba, 2014). We set the batch size to be equal to the sample size of 100 throughout all methods. The learning rate for g is 1e-4 and f is 5e-4. The batch size is set as 200 for all training sizes. For methods that rely on stochastic gradient descent (CN, DP, CDP, PPGPR, and EN), we set the batch size for CPU and MPG datasets to 64, and the rest of the datasets to 128. We train with 300 epochs in each experiment. |