The High-Dimensional Geometry of Binary Neural Networks
Authors: Alexander G. Anderson, Cory P. Berg
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Recent research has shown that one can train a neural network with binary weights and activations at train time by augmenting the weights with a high-precision continuous latent variable that accumulates small changes from stochastic gradient descent. Our main result is that the neural networks with binary weights and activations trained using the method of Courbariaux, Hubara et al. (2016) work because of the high-dimensional geometry of binary vectors. We investigate the internal representations of neural networks with binary weights and activations. A binary neural network is trained on CIFAR-10 (same learning algorithm and architecture as in Courbariaux et al. (2016)). Experiments on MNIST were carried out using both fully connected and convolutional networks and produced similar results. |
| Researcher Affiliation | Academia | Alexander G. Anderson Redwood Center for Theoretical Neuroscience University of California, Berkeley aga@berkeley.edu Cory P. Berg Redwood Center for Theoretical Neuroscience University of California, Berkeley cberg500@berkeley.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding the availability of its source code. |
| Open Datasets | Yes | A binary neural network is trained on CIFAR-10 (same learning algorithm and architecture as in Courbariaux et al. (2016)). Experiments on MNIST were carried out using both fully connected and convolutional networks and produced similar results. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and MNIST datasets but does not explicitly provide details about the train/validation/test splits, such as percentages, sample counts, or specific splitting methodology. |
| Hardware Specification | No | The paper mentions executing "7 times faster using a dedicated GPU kernel" but does not specify any particular GPU model or other hardware details (CPU, memory, specific cloud instances) used for the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., library names with specific versions). |
| Experiment Setup | Yes | The CIFAR-10 convolutional neural network has six layers of convolutions, all of which have a 3 by 3 spatial kernel. The number of feature maps in each layer are 128, 128, 256, 256, 512, 512. After the second, fourth, and sixth convolutions, there is a 2 by 2 max pooling operation. Then there are two fully connected layers with 1024 units each. Each layer has a batch norm layer in between. The experiments using ternary neural networks use the same network architecture. |