Neural Power Units

Authors: Niklas Heim, Tomas Pevny, Vasek Smidl

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that the NPUs outperform their competitors in terms of accuracy and sparsity on artificial arithmetic datasets, and that the Real NPU can discover the governing equations of a dynamical system only from data.
Researcher Affiliation Academia Niklas Heim, Tomáš Pevný, Václav Šmídl Artificial Intelligence Center Czech Technical University Prague, CZ 120 00 {niklas.heim, tomas.pevny, vasek.smidl}@aic.fel.cvut.cz
Pseudocode No The paper provides mathematical definitions and diagrams (Fig. 1, Fig. 2) of the NPU and Naive NPU, but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Implementation of Neural Arithmetic Units: github.com/nmheim/Neural Arithmetic.jl The code to reproduce our experiments is available at github.com/nmheim/Neural Power Units.
Open Datasets No No concrete access information (specific link, DOI, repository name, formal citation, or reference to established benchmark datasets) for a publicly available or open dataset was provided. The paper describes data generation processes for its experiments (e.g., 'We have numerically simulated one realization of the f SIR model', 'input samples are generated on the fly', 'dataset generation is defined in the set of Eq. 23').
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. For the fSIR model, it mentions 'training data X' but no validation split. For the simple and large-scale arithmetic tasks, it states inputs are 'generated on the fly during training' and focuses on training and testing without specifying a validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory amounts, or cloud instances) used to run the experiments.
Software Dependencies No The paper mentions 'Julia packages Flux.jl [Innes et al., 2018] and Differential Equations.jl [Rackauckas and Nie, 2017])' but does not specify their version numbers or other software dependencies with versions.
Experiment Setup Yes We train each model for 3000 steps with the ADAM optimizer and a learning rate of 0.005, and subsequently with LBFGS until convergence (or for maximum 1000 steps). For each model type, we run a small grid search to build a Pareto front with h {6, 9, 12, 15, 20} and β {0, 0.01, 0.1, 1}.