Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neural Arithmetic Units
Authors: Andreas Madsen, Alexander Rosenberg Johansen
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Academia | Andreas Madsen Computationally Demanding EMAIL Alexander Rosenberg Johansen Technical University of Denmark EMAIL |
| Pseudocode | Yes | Algorithm 1 deο¬nes the exact procedure to generate the data, where an interpolation range will be used for training and validation and an extrapolation range will be used for testing. |
| Open Source Code | Yes | 1Implementation is available on Git Hub: https://github.com/Andreas Madsen/stable-nalu. |
| Open Datasets | Yes | Furthermore, we improve upon existing benchmarks in Trask et al. (2018) by expanding the simple function task , expanding MNIST Counting and Arithmetic Tasks with a multiplicative task, and using an improved success-criterion Madsen & Johansen (2019). |
| Dataset Splits | Yes | Each experiment is trained for 5 106 iterations with early stopping by using the validation dataset, which is based on the interpolation range (details in Appendix C.2). |
| Hardware Specification | Yes | Training takes about 8 hours on a single CPU core(8-Core Intel Xeon E5-2665 2.4GHz). |
| Software Dependencies | No | The paper mentions 'Adam optimization (Kingma & Ba, 2014)' and refers to 'pytorch' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Each experiment is trained for 5 106 iterations with early stopping by using the validation dataset, which is based on the interpolation range (details in Appendix C.2). |