Neural Arithmetic Logic Units
Authors: Andrew Trask, Felix Hill, Scott E. Reed, Jack Rae, Chris Dyer, Phil Blunsom
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images. In contrast to conventional architectures, we obtain substantially better generalization both inside and outside of the range of numerical values encountered during training, often extrapolating orders of magnitude beyond trained numerical ranges. |
| Researcher Affiliation | Collaboration | Deep Mind University of Oxford University College London {atrask,felixhill,reedscot,jwrae,cdyer,pblunsom}@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. A link to third-party MNIST examples is provided, but not the authors' own NALU/NAC code. |
| Open Datasets | Yes | In the MNIST Digit Counting task, the model must learn to count how many images of each type it has seen (a 10-way regression), and in the MNIST Digit Addition task, it must learn to compute the sum of the digits it observed (a linear regression). Each training series is formed using images from the MNIST digit training set, and each testing series from the MNIST test set. |
| Dataset Splits | Yes | We trained and tested using numbers from 0 to 1000. The training set consists of the numbers 0-19 in addition to a random sample from the rest of the interval, adjusted to make sure that each unique token is present at least once in the training set. There are 169 examples in the training set, 200 in validation, and 631 in the test test. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'pytorch' in a footnote, but does not provide a specific version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | We train an autoencoder to take a scalar value as input (e.g., the number 3), encode the value within its hidden layers (distributed representations), then reconstruct the input value as a linear combination of the last hidden layer (3 again). Each autoencoder we train is identical in its parameterization (3 hidden layers of size 8), tuning (10,000 iterations, learning rate of 0.01, squared error loss), and initialization, differing only on the choice of nonlinearity on hidden layers. |