Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable Deep Gaussian Markov Random Fields for General Graphs

Authors: Joel Oskarsson, Per Sidén, Fredrik Lindsten

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The usefulness of the proposed model is verified by experiments on a number of synthetic and real world datasets, where it compares favorably to other both Bayesian and deep learning methods.
Researcher Affiliation	Collaboration	1Division of Statistics and Machine Learning, Department of Computer and Information Science, Link oping University, Link oping, Sweden 2Arriver Software AB.
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code	Yes	Our code is available at https://github.com/joeloskarsson/graph-dgmrf.
Open Datasets	Yes	Wikipedia graphs were created and made available7 by Rozemberczki et al. (2021).; classical California housing dataset (Kelley Pace & Barry, 1997). This dataset contains median house values of 20 640 housing blocks located in California. Based on their spatial coordinates we create a sparse graph by Delaunay triangulation (De Loera et al., 2010).; The wind speed data originates from the Wind Integration National Dataset Toolkit10.
Dataset Splits	Yes	We generally treat 50% of nodes as unobserved, chosen uniformly at random.; For the MLP baseline we consider the layer configurations (number of hidden layers dimensionality) {1 128, 1 512, 2 128, 2 512}... The layer configuration resulting in the lowest validation loss is then used for the ensemble.; The GNN models are trained in the same way as the MLP, also using 20% of the observed nodes for validation.
Hardware Specification	No	The paper mentions 'Using a consumer-grade GPU' but does not specify the exact model or other detailed hardware specifications for the experiments.
Software Dependencies	No	The paper mentions software like 'Py Torch', 'Py Torch Geometric', 'GPy Torch', and 'scikit-learn' but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	For training our DGMRF we use a learning rate of 0.01 and the Adam optimizer (Kingma & Ba, 2015) in all experiments. The model has not been observed to be sensitive to these choices so no extensive tuning has been done. Note that overfitting is not a considerable problem here. If necessary the learning rate can be tuned to make the ELBO converge, just using the training data (observed nodes). On synthetic data we train the model for 50 000 iterations, on the Wikipedia and California housing datasets 80 000 iterations (150 000 for 5-layer DGMRFs) and for the wind speed data 150 000 iterations. These numbers are large enough for the ELBO to converge and often unnecessarily high, meaning that runtimes could be slightly reduced with a more carfeful choice. In all experiments we use one DGMRF layer for G in the variational distribution q (see Eq. 11). At each iteration of training we draw 10 samples from q to estimate the expectation in the ELBO.