Parsimonious Bayesian deep networks

Authors: Mingyuan Zhou

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply PBDN to four different MNIST binary classification tasks and compare its performance with DNN (128-64), a two-hidden-layer deep neural network that will be detailedly described below. As in Tab. 1, both AIC and AICϵ=0.01 infer the depth as T = 1 for PBDN, and infer for each class only a few active hyperplanes, each of which represents a distinct data subtype, as calculated with (2). In a random trial, the inferred networks of PBDN for all four tasks have only a single hidden layer with at most 6 active hidden units. Thus its testing computation is much lower than DNN (128-64), while providing an overall lower testing error rate (both trained with 4000 mini-batches of size 100).
Researcher Affiliation Academia Mingyuan Zhou Department of IROM, Mc Combs School of Business The University of Texas at Austin, Austin, TX 78712 mingyuan.zhou@mccombs.utexas.edu
Pseudocode Yes We describe both Gibbs sampling, desirable for uncertainty quantification, and maximum a posteriori (MAP) inference, suitable for large-scale training, in Algorithm 1. We use data augmentation and marginalization to derive Gibbs sampling, with the details deferred to Appendix B. For MAP inference, we use Adam [32] in Tensorflow to minimize a stochastic objective function as f({βk, ln rk}K 1 , {yi, xi}i M i1 )+f({β k, ln r k}K 1 , {y i , xi}i M i1 ), which embeds the hierarchical Bayesian model s inductive bias and inherent shrinking mechanism into optimization, where M is the size of a randomly selected mini-batch, y i := 1 yi, λi := PK k=1 eln rk ln(1 + ex iβk), and f({βk, ln rk}K 1 , {yi, xi}i M i1 ) = PK k=1 γ0 K ln rk + c0eln rk + (aβ + 1/2) PV v=0 PK k=0 [ln(1 + β2 vk/(2bβk))] + N M Pi M i=i1 yi ln 1 e λi + (1 yi)λi . (8)
Open Source Code Yes Code for reproducible research is available at https://github.com/mingyuanzhou/PBDN.
Open Datasets Yes We apply PBDN to four different MNIST binary classification tasks and compare its performance with DNN (128-64)... Below we provide comprehensive comparison on eight widely used benchmark datasets between the proposed PBDNs and a variety of algorithms, including logistic regression, Gaussian radial basis function (RBF) kernel support vector machine (SVM), relevance vector machine (RVM) [31], adaptive multi-hyperplane machine (AMM) [27], convex polytope machine (CPM) [30], and the deep neural network (DNN) classifier (DNNClassifier) provided in Tensorflow [33]. We consider DNN (8-4), a two-hidden-layer DNN that uses 8 and 4 hidden units for its first and second hidden layers, respectively, DNN (32-16), and DNN (128-64). In the Appendix, we summarize in Tab. 4 the information of eight benchmark datasets, including banana, breast cancer, titanic, waveform, german, image, ijcnn1, and a9a.
Dataset Splits No The paper mentions 'training/testing partitions' but does not explicitly provide details on how these splits are generated (e.g., percentages, random seed, cross-validation setup) beyond referring to 'widely used open-source software packages'.
Hardware Specification Yes M. Zhou acknowledges the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research, and the computational support of Texas Advanced Computing Center.
Software Dependencies No The paper mentions 'Tensorflow' but does not specify a version number or other software dependencies with version numbers.
Experiment Setup Yes For Gibbs sampling, we run 5000 iterations and record {rk, βk}k with the highest likelihood during the last 2500 iterations; for MAP, we process 4000 mini-batches of size M = 100, with 0.05/(4 + T) as the Adam learning rate for the Tth added i SHM pair. We set a0 = b0 = 0.01, e0 = f0 = 1, and aβ = 10 6 for Gibbs sampling. We fix γ0 = c0 = 1 and aβ = bβk = 10 6 for MAP inference.