End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking

Authors: Arpit Bansal, Avi Schwarzschild, Eitan Borgnia, Zeyad Emam, Furong Huang, Micah Goldblum, Tom Goldstein

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment on benchmark problems for measuring extrapolation behavior. These tasks include computing prefix sums, finding optimal solutions for two-dimensional mazes, and solving chess puzzles [21]. Our architectures and training routines significantly outperform existing methods on all tasks in the benchmark suite.
Researcher Affiliation Academia Arpit Bansal University of Maryland bansal01@umd.edu Avi Schwarzschild University of Maryland avi1@umd.edu Eitan Borgnia University of Chicago eborgnia@uchicago.edu Zeyad Emam University of Maryland Furong Huang University of Maryland Micah Goldblum New York University Tom Goldstein University of Maryland
Pseudocode Yes Algorithm 1 Incremental Progress Training Algorithm Input: parameter vector , integer m, weight for batch_idx = 1, 2, ... do Choose n U{0, m 1} and k U{1, m n} Compute φn with n iterations w/o tracking gradients Compute ˆyprog with additional k iterations Compute ˆym with new forward pass of m iterations Compute Lmax_iters with ˆym and Lprogressive with ˆyprog. Compute L = (1 ) Lmax_iters + Lprogressive Compute r L and update end for
Open Source Code Yes code to reproduce all of our experiments is publicly available.3 3A Py Torch implementation of the code is available at github.com/aks2203/deep-thinking.
Open Datasets Yes We evaluate our methods on the benchmark problems available in the Python package easy-to-hard-data, which generates reasoning problems of various difficulties. The three problems considered are computing prefix sums, finding the optimal path in a two-dimensional maze, and solving chess puzzles. We briefly review the input and output structures for each problem and refer the reader to Schwarzschild et al. [21] for more detail, including the data generation process.
Dataset Splits Yes All of our training is done on 32-bit strings and we explore the behavior of our models on longer strings, even showing strong performance on 512-bits." and "All training in our work is done on 9 9 mazes. Despite this small training size, we benchmark extrapolation performance on mazes of size 59 59, and observe models solving mazes as large as 201 201 with high success rates." and "Problems are sorted by difficulty and we train on the first 600K easiest puzzles and test our models on indices ranging from 700K to 1.1M.
Hardware Specification No The paper mentions pushing systems 'to their limits (and the limits of our hardware)' but does not provide any specific details about the hardware used, such as GPU or CPU models.
Software Dependencies No A Py Torch implementation of the code is available at github.com/aks2203/deep-thinking. While PyTorch is mentioned, no specific version number is provided for PyTorch or any other software dependency.
Experiment Setup Yes We train our models on 32-bit strings with a maximum of only m = 30 recurrent iterations (indicated in the figures as the shaded train regime section), 400 convolutional filters per layer, and incremental loss weight = 1 when progressive loss is used." and "The best models in Figure 4 are DT-Recall models trained with 30 maximum iterations and a weighting in the loss of = 0.01." and "Our best models are DT-Recall networks with 512 convolutional filters in each layer trained with a maximum of 30 iterations and a weight of = 0.5." and "In our implementation, we randomly sample the number of iterations used to generate a partial solution, n, and the number of training iterations, k, budgeted for the network to improve this partial solution.