Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties
Authors: Jakob Lindinger, David Reeb, Christoph Lippert, Barbara Rakitsch
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply it to several benchmark datasets. It yields excellent results and strikes a better balance between accuracy and calibrated uncertainty estimates than its state-of-the-art alternatives. In Sec. 4, we show experimentally that the new algorithm works well in practice. |
| Researcher Affiliation | Collaboration | 1Bosch Center for Artificial Intelligence, Renningen, Germany 2Hasso Plattner Institute, Potsdam, Germany 3University of Potsdam, Germany EMAIL, EMAIL |
| Pseudocode | Yes | A pseudocode description of our algorithm is given in Appx. F. |
| Open Source Code | Yes | Python code (building on code for the mean-field DGP [25], GPflow [19] and Tensor Flow [1]) implementing our method is provided at https://github.com/boschresearch/Structured_DGP. |
| Open Datasets | Yes | We report results on eight UCI datasets and employ as evaluation criterion the average marginal test log-likelihood (tll). |
| Dataset Splits | Yes | We assessed the interpolation behaviour of the different approaches by randomly partitioning the data into a training and a test set with a 90 : 10 split. To investigate the extrapolation behaviour, we created test instances that are distant from the training samples: ...divided them accordingly into training and test set using a 50 : 50 split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models, memory specifications, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions software like 'GPflow' and 'Tensor Flow' but does not specify their version numbers, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We used a fully-coupled DGP with our standard three layer architecture (see Sec. 3.2), on the concrete UCI dataset trained with Adam [15]. For our standard setting, M = 128, our STAR approximation was only two times slower than the mean-field but three times faster than FC DGP (trained with Adam [15]). |