Sigma Delta Quantized Networks
Authors: Peter O'Connor, Max Welling
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We started our experiment with a conventional Re LU network with layer sizes [784-200-200-10] pretrained on MNIST to a test-accuracy of 97.9%. We then apply the same scale-optimization procedure for the Rounding Network used in the previous experiment to find the optimal rescalings under a range of values for λ. This time, we test the learned scale parameters on both the Rounding Network and the Sigma-Delta network. We do not attempt to directly optimize the scales with respect to the amount of computation in the Sigma-Delta network we assume that the result should be similar to that for the rounding network, but verifying this is the topic of future work.The results of this experiment can be seen in Figure 4. We see that our discretized networks (Rounding and Sigma-Delta) converge to the error of the original network with fewer computations than are required for a forward pass of the original neural network. |
| Researcher Affiliation | Academia | Peter O Connor, Max Welling QUVA Lab, Informatics Institute University of Amsterdam Amsterdam, Netherlands {p.e.oconnor,m.welling}@uva.nl |
| Pseudocode | Yes | Algorithm 1 Temporal Difference ( T ): 1: Internal: xlast Rd 0 2: Input: x Rd 3: y x xlast 4: xlast x 5: Return: y Rd |
| Open Source Code | Yes | Code for our experiments can be found at: https://github.com/petered/sigma-delta/ |
| Open Datasets | Yes | Temporal-MNIST . This is just a reshuffling of the standard MNIST dataset so that similar frames tend to be nearby, giving the impression of a temporal sequence (see Appendix D for details). |
| Dataset Splits | No | The paper mentions using a 'test-accuracy' for pretrained MNIST, but doesn't explicitly state the train/validation/test splits or a cross-validation setup. |
| Hardware Specification | No | The paper mentions 45nm silicon process but does not specify the hardware used for running their experiments. It only discusses the theoretical energy consumption on such hardware. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers, such as Python or PyTorch versions. |
| Experiment Setup | Yes | We started our experiment with a conventional Re LU network with layer sizes [784-200-200-10] pretrained on MNIST to a test-accuracy of 97.9%. |