Neural Execution Engines: Learning to Execute Subroutines

Authors: Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we observe that transformer-based sequence-to-sequence models can learn subroutines like sorting a list of numbers, but their performance rapidly degrades as the length of lists grows beyond those found in the training set. To address the issue, we propose a learned conditional masking mechanism, which enables the model to strongly generalize far outside of its training range with near-perfect accuracy on a variety of algorithms. We evaluate using a NEE to execute various algorithms, including selection sort and merge sort, as well as more complex graph algorithms like shortest path and minimum spanning tree. Figure 3: Sorting performance of transformers trained on sequences of up to length 8. Table 1: Performance of different tasks on variable sizes of test examples (trained with examples of size 8). Table 2: NEE 8-bit addition performance.
Researcher Affiliation Collaboration Yujun Yan The University of Michigan yujunyan@umich.edu Kevin Swersky Google Research kswersky@google.com Danai Koutra The University of Michigan dkoutra@umich.edu Parthasarathy Ranganathan, Milad Hashemi Google Research {parthas, miladh}@google.com Work completed during an internship at Google.
Pseudocode Yes Figure 1: (Top) Pseudocode for four common algorithms: selection sort, merge sort, Dijkstra s algorithm for shortest paths, and Prim s algorithm for minimum spanning tree.
Open Source Code Yes Code for this paper can be found at https://github.com/Yujun-Yan/Neural-Execution-Engines
Open Datasets Yes The examples are Erd os-Rényi random graphs. We train on graphs with up to 8 nodes and test on graphs of up to 100 nodes, with 100 graphs evaluated at each size. Weights are randomly assigned within the allowed 8-bit number range. Training on the entire 8-bit number range (256 numbers) and testing on unseen pairs of numbers, the NEE achieves 100% accuracy.
Dataset Splits No The paper describes training and testing ranges, e.g., 'train on graphs with up to 8 nodes and test on graphs of up to 100 nodes'. It also mentions 'holding out random numbers during training' for addition. However, it does not specify explicit dataset splits for validation or quantitative train/test/validation percentages or counts.
Hardware Specification No The paper does not explicitly mention any specific hardware used for running the experiments, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies No The paper mentions using a 'transformer-based network' but does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup No The paper provides some details about the experimental approach, such as using a 'greedy decoding strategy' and 'one-hot encoded' inputs for a vanilla transformer, and mentions 'minimally tune hyper-parameters' and 'weights are randomly assigned'. It also states that more modifications and ablations are in Appendix A.2 (which is not provided). However, it lacks specific numerical hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations required for full reproducibility in the main text.