Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Implicit Posterior Variational Inference for Deep Gaussian Processes
Authors: Haibin YU, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet, Zhongxiang Dai
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation shows that IPVI outperforms the state-of-the-art approximation methods for DGPs. We empirically evaluate and compare the performance of our IPVI framework against that of the state-of-the-art SGHMC [18] and doubly stochastic VI [48] for DGPs based on their publicly available implementations using synthetic and real-world datasets in supervised (e.g., regression and classification) and unsupervised learning tasks. |
| Researcher Affiliation | Academia | Dept. of Computer Science, National University of Singapore, Republic of Singapore Dept. of Electrical Engineering and Computer Science, MIT, USA EMAIL, EMAIL |
| Pseudocode | Yes | Figure 1: Best-response dynamics (BRD) algorithm based on our IPVI framework for DGPs. Algorithm 1: Main, Algorithm 2: Player 1, Algorithm 3: Player 2 |
| Open Source Code | Yes | Our implementation is built on GPflow [41] which is an open-source GP framework based on Tensor Flow [1]. It is publicly available at https://github.com/Hero Killer Ever/ipvi-dgp. |
| Open Datasets | Yes | We empirically evaluate and compare the performance of our IPVI framework...using synthetic and real-world datasets...UCI Benchmark Regression...Large-Scale Regression...Year MSD dataset...Airline dataset...Frey Face dataset [47] |
| Dataset Splits | No | We have performed a random 0.9/0.1 train/test split. The paper does not explicitly describe a validation dataset split or provide specific percentages for it. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used to run its experiments, only mentioning the use of TensorFlow. |
| Software Dependencies | No | Our implementation is built on GPflow [41] which is an open-source GP framework based on Tensor Flow [1]. (No specific version numbers for GPflow or TensorFlow are provided in the text). |
| Experiment Setup | Yes | the depth L of the DGP models are varied from 1 to 5 with 128 inducing inputs per layer. The learning rates are 0.005 and 0.02 for IPVI and SGHMC (default setting adopted from [18]), respectively. We utilize a 4-layer DGP model with 100 inducing inputs per layer and a robust-max multiclass likelihood [21]; for MNIST dataset, we also consider utilizing a 4-layer DGP model with 800 inducing inputs per layer to assess if its performance improves with more inducing inputs. |