Parsimonious Bayesian deep networks
Authors: Mingyuan Zhou
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply PBDN to four different MNIST binary classification tasks and compare its performance with DNN (128-64), a two-hidden-layer deep neural network that will be detailedly described below. As in Tab. 1, both AIC and AICϵ=0.01 infer the depth as T = 1 for PBDN, and infer for each class only a few active hyperplanes, each of which represents a distinct data subtype, as calculated with (2). In a random trial, the inferred networks of PBDN for all four tasks have only a single hidden layer with at most 6 active hidden units. Thus its testing computation is much lower than DNN (128-64), while providing an overall lower testing error rate (both trained with 4000 mini-batches of size 100). |
| Researcher Affiliation | Academia | Mingyuan Zhou Department of IROM, Mc Combs School of Business The University of Texas at Austin, Austin, TX 78712 mingyuan.zhou@mccombs.utexas.edu |
| Pseudocode | Yes | We describe both Gibbs sampling, desirable for uncertainty quantification, and maximum a posteriori (MAP) inference, suitable for large-scale training, in Algorithm 1. We use data augmentation and marginalization to derive Gibbs sampling, with the details deferred to Appendix B. For MAP inference, we use Adam [32] in Tensorflow to minimize a stochastic objective function as f({βk, ln rk}K 1 , {yi, xi}i M i1 )+f({β k, ln r k}K 1 , {y i , xi}i M i1 ), which embeds the hierarchical Bayesian model s inductive bias and inherent shrinking mechanism into optimization, where M is the size of a randomly selected mini-batch, y i := 1 yi, λi := PK k=1 eln rk ln(1 + ex iβk), and f({βk, ln rk}K 1 , {yi, xi}i M i1 ) = PK k=1 γ0 K ln rk + c0eln rk + (aβ + 1/2) PV v=0 PK k=0 [ln(1 + β2 vk/(2bβk))] + N M Pi M i=i1 yi ln 1 e λi + (1 yi)λi . (8) |
| Open Source Code | Yes | Code for reproducible research is available at https://github.com/mingyuanzhou/PBDN. |
| Open Datasets | Yes | We apply PBDN to four different MNIST binary classification tasks and compare its performance with DNN (128-64)... Below we provide comprehensive comparison on eight widely used benchmark datasets between the proposed PBDNs and a variety of algorithms, including logistic regression, Gaussian radial basis function (RBF) kernel support vector machine (SVM), relevance vector machine (RVM) [31], adaptive multi-hyperplane machine (AMM) [27], convex polytope machine (CPM) [30], and the deep neural network (DNN) classifier (DNNClassifier) provided in Tensorflow [33]. We consider DNN (8-4), a two-hidden-layer DNN that uses 8 and 4 hidden units for its first and second hidden layers, respectively, DNN (32-16), and DNN (128-64). In the Appendix, we summarize in Tab. 4 the information of eight benchmark datasets, including banana, breast cancer, titanic, waveform, german, image, ijcnn1, and a9a. |
| Dataset Splits | No | The paper mentions 'training/testing partitions' but does not explicitly provide details on how these splits are generated (e.g., percentages, random seed, cross-validation setup) beyond referring to 'widely used open-source software packages'. |
| Hardware Specification | Yes | M. Zhou acknowledges the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research, and the computational support of Texas Advanced Computing Center. |
| Software Dependencies | No | The paper mentions 'Tensorflow' but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | For Gibbs sampling, we run 5000 iterations and record {rk, βk}k with the highest likelihood during the last 2500 iterations; for MAP, we process 4000 mini-batches of size M = 100, with 0.05/(4 + T) as the Adam learning rate for the Tth added i SHM pair. We set a0 = b0 = 0.01, e0 = f0 = 1, and aβ = 10 6 for Gibbs sampling. We fix γ0 = c0 = 1 and aβ = bβk = 10 6 for MAP inference. |