reproducibilityindex.ai

Parsimonious Bayesian deep networks

Authors: Mingyuan Zhou

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply PBDN to four different MNIST binary classiﬁcation tasks and compare its performance with DNN (128-64), a two-hidden-layer deep neural network that will be detailedly described below. As in Tab. 1, both AIC and AICϵ=0.01 infer the depth as T = 1 for PBDN, and infer for each class only a few active hyperplanes, each of which represents a distinct data subtype, as calculated with (2). In a random trial, the inferred networks of PBDN for all four tasks have only a single hidden layer with at most 6 active hidden units. Thus its testing computation is much lower than DNN (128-64), while providing an overall lower testing error rate (both trained with 4000 mini-batches of size 100).
Researcher Affiliation	Academia	Mingyuan Zhou Department of IROM, Mc Combs School of Business The University of Texas at Austin, Austin, TX 78712 mingyuan.zhou@mccombs.utexas.edu
Pseudocode	Yes	We describe both Gibbs sampling, desirable for uncertainty quantiﬁcation, and maximum a posteriori (MAP) inference, suitable for large-scale training, in Algorithm 1. We use data augmentation and marginalization to derive Gibbs sampling, with the details deferred to Appendix B. For MAP inference, we use Adam [32] in Tensorﬂow to minimize a stochastic objective function as f({βk, ln rk}K 1 , {yi, xi}i M i1 )+f({β k, ln r k}K 1 , {y i , xi}i M i1 ), which embeds the hierarchical Bayesian model s inductive bias and inherent shrinking mechanism into optimization, where M is the size of a randomly selected mini-batch, y i := 1 yi, λi := PK k=1 eln rk ln(1 + ex iβk), and f({βk, ln rk}K 1 , {yi, xi}i M i1 ) = PK k=1 γ0 K ln rk + c0eln rk + (aβ + 1/2) PV v=0 PK k=0 [ln(1 + β2 vk/(2bβk))] + N M Pi M i=i1 yi ln 1 e λi + (1 yi)λi . (8)
Open Source Code	Yes	Code for reproducible research is available at https://github.com/mingyuanzhou/PBDN.
Open Datasets	Yes	We apply PBDN to four different MNIST binary classiﬁcation tasks and compare its performance with DNN (128-64)... Below we provide comprehensive comparison on eight widely used benchmark datasets between the proposed PBDNs and a variety of algorithms, including logistic regression, Gaussian radial basis function (RBF) kernel support vector machine (SVM), relevance vector machine (RVM) [31], adaptive multi-hyperplane machine (AMM) [27], convex polytope machine (CPM) [30], and the deep neural network (DNN) classiﬁer (DNNClassiﬁer) provided in Tensorﬂow [33]. We consider DNN (8-4), a two-hidden-layer DNN that uses 8 and 4 hidden units for its ﬁrst and second hidden layers, respectively, DNN (32-16), and DNN (128-64). In the Appendix, we summarize in Tab. 4 the information of eight benchmark datasets, including banana, breast cancer, titanic, waveform, german, image, ijcnn1, and a9a.
Dataset Splits	No	The paper mentions 'training/testing partitions' but does not explicitly provide details on how these splits are generated (e.g., percentages, random seed, cross-validation setup) beyond referring to 'widely used open-source software packages'.
Hardware Specification	Yes	M. Zhou acknowledges the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research, and the computational support of Texas Advanced Computing Center.
Software Dependencies	No	The paper mentions 'Tensorﬂow' but does not specify a version number or other software dependencies with version numbers.
Experiment Setup	Yes	For Gibbs sampling, we run 5000 iterations and record {rk, βk}k with the highest likelihood during the last 2500 iterations; for MAP, we process 4000 mini-batches of size M = 100, with 0.05/(4 + T) as the Adam learning rate for the Tth added i SHM pair. We set a0 = b0 = 0.01, e0 = f0 = 1, and aβ = 10 6 for Gibbs sampling. We ﬁx γ0 = c0 = 1 and aβ = bβk = 10 6 for MAP inference.