Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

Authors: Boyao Li, Alexander Thomson, houssam nassif, Matthew Engelhard, David Page

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate empirically that after training a network quickly using SGD, calibration can be improved by fine-tuning using HMC. The specific HMC algorithm employed here follows directly from the theoretical result, being designed to approximate Gibbs-sampling in the theoretical, infinite-width tree structured Markov network. The degree of approximation is controlled by the value of a single hyperparameter that is also defined based on the theoretical result. (...) Finally, in the context of sigmoid activations we empirically evaluate how the second and third benefits listed above follow from the result, as motivated and summarized now in the next paragraph.
Researcher Affiliation	Collaboration	Boyao Li Department of Biostatistics and Bioinformatics Duke University EMAIL Alexander J. Thomson Department of Computer Science Duke University EMAIL Houssam Nassif Meta Inc. EMAIL Matthew M. Engelhard Department of Biostatistics and Bioinformatics Duke University EMAIL David Page Department of Biostatistics and Bioinformatics Duke University EMAIL
Pseudocode	Yes	Algorithm 1 Step 1 of the PGM Construction (...) Algorithm 2 Step 2 of the PGM Construction (...) Algorithm 3 CD-k Learning for the Deep Belief Network
Open Source Code	Yes	All code needed to reproduce our experimental results may be found at https://github.com/ engelhard-lab/DNN_Tree PGM.
Open Datasets	Yes	The synthetic datasets are generated by simple BNs and MNs with their weights in different ranges, which are used to define the conditional probabilistic distributions for BNs and potentials for MNs. Each dataset contains 1000 data points {(Xi, yi)}, i = 1, 2, ..., 1000, where each input Xi {0, 1}n is a binary vector with n dimension and each output yi {0, 1} is a binary value. (...) Similar experiments are also run on the Covertype dataset to compare the calibration of SGD in DNNs, Gibbs and the HMC-based algorithm. Since the ground truth for the distribution of P(y\|X) cannot be found, the metric for the calibration used in this experiment is the expected calibration error (ECE), which is a common metric for model calibration. To simplify the classification task, we choose the data with label 1 and 2 and build two binary subsets, each of which contains 1000 data points. (...) Covertype. UCI Machine Learning Repository, 1998. DOI: https://doi.org/10.24432/C50K5N.
Dataset Splits	Yes	For all the experiments, the train-test split ratio is 80:20.
Hardware Specification	No	An internal cluster of GPUs was employed for all experiments, and part of run-times are provided in the appendix; as anticipated, SGD is faster than HMC, which is faster than Gibbs. (No specific GPU model or detailed cluster specs provided beyond 'GPUs' and 'internal cluster').
Software Dependencies	No	Adam optimizer is used with learning rate being 1 10 4. (No specific version numbers are provided for any software components like Adam, Python, or relevant libraries).
Experiment Setup	Yes	For all the experiments, the train-test split ratio is 80:20. For the training and finetuning, Adam optimizer is used with learning rate being 1 10 4. To get the predicted probabilities for fine-tuned network, 1000 output probabilities are sampled and averaged. In synthetic experiments, both BNs and MNs have the structure with the input dimension being 4, two latent layers with 4 nodes in each one, and one binary output. (...) Here L defines the normal distribution for hidden nodes in Eqn. 1 and is explored across the set of values: {10, 100, 1000}. (...) The number of training epochs is also 100 or 1000, while the fine-tuning epochs shown in Table 2 is 20.