End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking
Authors: Arpit Bansal, Avi Schwarzschild, Eitan Borgnia, Zeyad Emam, Furong Huang, Micah Goldblum, Tom Goldstein
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on benchmark problems for measuring extrapolation behavior. These tasks include computing prefix sums, finding optimal solutions for two-dimensional mazes, and solving chess puzzles [21]. Our architectures and training routines significantly outperform existing methods on all tasks in the benchmark suite. |
| Researcher Affiliation | Academia | Arpit Bansal University of Maryland bansal01@umd.edu Avi Schwarzschild University of Maryland avi1@umd.edu Eitan Borgnia University of Chicago eborgnia@uchicago.edu Zeyad Emam University of Maryland Furong Huang University of Maryland Micah Goldblum New York University Tom Goldstein University of Maryland |
| Pseudocode | Yes | Algorithm 1 Incremental Progress Training Algorithm Input: parameter vector , integer m, weight for batch_idx = 1, 2, ... do Choose n U{0, m 1} and k U{1, m n} Compute φn with n iterations w/o tracking gradients Compute ˆyprog with additional k iterations Compute ˆym with new forward pass of m iterations Compute Lmax_iters with ˆym and Lprogressive with ˆyprog. Compute L = (1 ) Lmax_iters + Lprogressive Compute r L and update end for |
| Open Source Code | Yes | code to reproduce all of our experiments is publicly available.3 3A Py Torch implementation of the code is available at github.com/aks2203/deep-thinking. |
| Open Datasets | Yes | We evaluate our methods on the benchmark problems available in the Python package easy-to-hard-data, which generates reasoning problems of various difficulties. The three problems considered are computing prefix sums, finding the optimal path in a two-dimensional maze, and solving chess puzzles. We briefly review the input and output structures for each problem and refer the reader to Schwarzschild et al. [21] for more detail, including the data generation process. |
| Dataset Splits | Yes | All of our training is done on 32-bit strings and we explore the behavior of our models on longer strings, even showing strong performance on 512-bits." and "All training in our work is done on 9 9 mazes. Despite this small training size, we benchmark extrapolation performance on mazes of size 59 59, and observe models solving mazes as large as 201 201 with high success rates." and "Problems are sorted by difficulty and we train on the first 600K easiest puzzles and test our models on indices ranging from 700K to 1.1M. |
| Hardware Specification | No | The paper mentions pushing systems 'to their limits (and the limits of our hardware)' but does not provide any specific details about the hardware used, such as GPU or CPU models. |
| Software Dependencies | No | A Py Torch implementation of the code is available at github.com/aks2203/deep-thinking. While PyTorch is mentioned, no specific version number is provided for PyTorch or any other software dependency. |
| Experiment Setup | Yes | We train our models on 32-bit strings with a maximum of only m = 30 recurrent iterations (indicated in the figures as the shaded train regime section), 400 convolutional filters per layer, and incremental loss weight = 1 when progressive loss is used." and "The best models in Figure 4 are DT-Recall models trained with 30 maximum iterations and a weighting in the loss of = 0.01." and "Our best models are DT-Recall networks with 512 convolutional filters in each layer trained with a maximum of 30 iterations and a weight of = 0.5." and "In our implementation, we randomly sample the number of iterations used to generate a partial solution, n, and the number of training iterations, k, budgeted for the network to improve this partial solution. |