Analytical Guarantees on Numerical Precision of Deep Neural Networks

Authors: Charbel Sakr, Yongjune Kim, Naresh Shanbhag

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide numerical evidence showing how our approach allows us to maintain high accuracy but with lower complexity than state-of-the-art binary networks. We conduct numerical simulations to illustrate both the validity and usefulness of the analysis developed in the previous section. We present results on two popular datasets: MNIST and CIFAR-10.
Researcher Affiliation Academia The authors are with the University of Illinois at Urbana Champaign, 1308 W Main St., Urabna, IL 61801 USA.
Pseudocode No The paper does not include any sections or figures explicitly labeled as "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not include any explicit statements about making the source code for its described methodology publicly available, nor does it provide a link to a code repository.
Open Datasets Yes First, we conduct simulations on the MNIST dataset for handwritten character recognition (Le Cun et al., 1998). The dataset consists of 60K training samples and 10K test samples. We conduct a similar experiment on the CIFAR10 dataset (Krizhevsky & Hinton, 2009). The dataset consists of 60K color images... 50K of these images constitute the training set, and the 10K remaining are for testing.
Dataset Splits No The paper specifies the number of training and test samples for MNIST and CIFAR10 datasets (e.g., "60K training samples and 10K test samples" for MNIST). It mentions using "an estimation set of 1000 random samples from the dataset" for computing bounds, but this is not specified as a validation split for model training, nor is a clear train/validation/test split for reproducibility explicitly stated.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU model, CPU type, memory) used to run the experiments.
Software Dependencies No The paper mentions general software components like "back-propagation algorithm", "dropout", and "Re LU activations" but does not specify any software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We used a batch size of 200 and a learning rate of 0.1 with a decay rate of 0.978 per epoch. We restore the learning rate every 100 epochs, the decay rate makes the learning rate vary between 0.1 and 0.01. We train the first 300 epochs using 15% dropout, the second 300 epochs using 20% dropout, and the third 300 epochs using 25% dropout (900 epochs overall). We used Re LU activations with the subtle addition of a right rectifier for values larger than 2... We also clipped the weights to lie in [-1, 1] at each iteration.