Sigma Delta Quantized Networks

Authors: Peter O'Connor, Max Welling

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We started our experiment with a conventional Re LU network with layer sizes [784-200-200-10] pretrained on MNIST to a test-accuracy of 97.9%. We then apply the same scale-optimization procedure for the Rounding Network used in the previous experiment to find the optimal rescalings under a range of values for λ. This time, we test the learned scale parameters on both the Rounding Network and the Sigma-Delta network. We do not attempt to directly optimize the scales with respect to the amount of computation in the Sigma-Delta network we assume that the result should be similar to that for the rounding network, but verifying this is the topic of future work.The results of this experiment can be seen in Figure 4. We see that our discretized networks (Rounding and Sigma-Delta) converge to the error of the original network with fewer computations than are required for a forward pass of the original neural network.
Researcher Affiliation Academia Peter O Connor, Max Welling QUVA Lab, Informatics Institute University of Amsterdam Amsterdam, Netherlands {p.e.oconnor,m.welling}@uva.nl
Pseudocode Yes Algorithm 1 Temporal Difference ( T ): 1: Internal: xlast Rd 0 2: Input: x Rd 3: y x xlast 4: xlast x 5: Return: y Rd
Open Source Code Yes Code for our experiments can be found at: https://github.com/petered/sigma-delta/
Open Datasets Yes Temporal-MNIST . This is just a reshuffling of the standard MNIST dataset so that similar frames tend to be nearby, giving the impression of a temporal sequence (see Appendix D for details).
Dataset Splits No The paper mentions using a 'test-accuracy' for pretrained MNIST, but doesn't explicitly state the train/validation/test splits or a cross-validation setup.
Hardware Specification No The paper mentions 45nm silicon process but does not specify the hardware used for running their experiments. It only discusses the theoretical energy consumption on such hardware.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers, such as Python or PyTorch versions.
Experiment Setup Yes We started our experiment with a conventional Re LU network with layer sizes [784-200-200-10] pretrained on MNIST to a test-accuracy of 97.9%.