Constructing Deep Neural Networks by Bayesian Network Structure Learning

Authors: Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, Gal Novik

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate on image classification benchmarks that the deepest layers (convolutional and dense) of common networks can be replaced by significantly smaller learned structures, while maintaining classification accuracy state-of-the-art on tested benchmarks. Our structure learning algorithm requires a small computational cost and runs efficiently on a standard desktop CPU.
Researcher Affiliation Industry Raanan Y. Rohekar Intel AI Lab raanan.yehezkel@intel.com Shami Nisimov Intel AI Lab shami.nisimov@intel.com Yaniv Gurwicz Intel AI Lab yaniv.gurwicz@intel.com Guy Koren Intel AI Lab guy.koren@intel.com Gal Novik Intel AI Lab gal.novik@intel.com
Pseudocode Yes Algorithm 1: G Deep Gen(g X, X, Xex, n)
Open Source Code No The paper states: 'Our structure learning algorithm is implemented using BNT (Murphy, 2001)', but does not provide a link or explicit statement about making their specific code open source or available.
Open Datasets Yes MNIST (Le Cun et al., 1998); SVHN (Netzer et al., 2011); CIFAR 10 (Krizhevsky & Hinton, 2009); CIFAR 100 (Krizhevsky & Hinton, 2009); Image Net (Deng et al., 2009)
Dataset Splits No Threshold for independence tests, and the number of neurons-per-layer were selected by using a validation set. The paper mentions the use of a validation set but does not provide specific split percentages or counts for it.
Hardware Specification No The paper mentions 'runs efficiently on a standard desktop CPU', which is too general to be a specific hardware detail.
Software Dependencies No The paper mentions 'Our structure learning algorithm is implemented using BNT (Murphy, 2001)'. While BNT is named, no version number is provided for BNT itself or any other software dependencies.
Experiment Setup No The paper mentions: 'In all the experiments, we used Re LU activations, ADAM (Kingma & Ba, 2015) optimization, batch normalization (Ioffe & Szegedy, 2015), and dropout (Srivastava et al., 2014) to all the dense layers.' It describes types of settings but does not provide specific hyperparameter values (e.g., learning rate, dropout rate, batch size).