Dynamic Optimization of Neural Network Structures Using Probabilistic Modeling
Authors: Shinichi Shirakawa, Yasushi Iwata, Youhei Akimoto
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the proposed method to several structure optimization problems such as selection of layers, selection of unit types, and selection of connections using the MNIST, CIFAR-10, and CIFAR-100 datasets. The experimental results show that the proposed method can find the appropriate and competitive network structures. |
| Researcher Affiliation | Academia | Shinichi Shirakawa Yokohama National University shirakawa-shinichi-bg@ynu.ac.jp Yasushi Iwata Yokohama National University iwata-yasushi-ct@ynu.jp Youhei Akimoto Shinshu University y akimoto@shinshu-u.ac.jp |
| Pseudocode | Yes | Algorithm 1 Optimization procedure of the proposed method instantiated with Bernoulli distribution. |
| Open Source Code | No | The paper mentions that algorithms are implemented using the Chainer framework, but it does not state that the authors' own implementation code for the proposed methodology is open-source or publicly available. |
| Open Datasets | Yes | We use the MNIST handwritten digits dataset containing the 60,000 training examples and 10,000 test examples of 28 28 gray-scale images. ... We use the CIFAR-10 and CIFAR-100 datasets in which the numbers of classes are 10 and 100, respectively. The numbers of training and test images are 50,000 and 10,000, respectively, and the size of the images is 32 32. |
| Dataset Splits | Yes | The training data is split into training and validation set in the ratio of nine to one; the validation set is used to evaluate a hyper-paremter after training the neural network with a candidate hyper-parameter. |
| Hardware Specification | Yes | The algorithms are implemented by the Chainer framework (Tokui et al. 2015) (version 1.23.0) on NVIDIA Geforce GTX 1070 GPU for experiments (I) to (III) and on NVIDIA TITAN X GPU for experiment (IV). |
| Software Dependencies | Yes | The algorithms are implemented by the Chainer framework (Tokui et al. 2015) (version 1.23.0). ... We use GPy Opt package (version 1.0.3, http://github.com/Sheffield ML/ GPy Opt) for the Bayesian optimization implementation and adopt the default parameter setting. |
| Experiment Setup | Yes | In all experiments, the SGD with a Nesterov momentum (Sutskever et al. 2013) of 0.9 and a weight decay of 10 4 is used to optimize the weight parameters. The learning rate is divided by 10 at 1/2 and 3/4 of the maximum number of epochs. ... For (I) Selection of Layers: The data sample size and the number of epochs are set to N = 64 and 100 for Adaptive Layer (a), respectively, and N = 128 and 200 for other algorithms. ... We initialize the learning rate of SGD by 0.01 and the Bernoulli parameters by θinit = 0.5 or θinit = 1 1/31 0.968. |