reproducibilityindex.ai

End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking

Authors: Arpit Bansal, Avi Schwarzschild, Eitan Borgnia, Zeyad Emam, Furong Huang, Micah Goldblum, Tom Goldstein

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment on benchmark problems for measuring extrapolation behavior. These tasks include computing preﬁx sums, ﬁnding optimal solutions for two-dimensional mazes, and solving chess puzzles [21]. Our architectures and training routines signiﬁcantly outperform existing methods on all tasks in the benchmark suite.
Researcher Affiliation	Academia	Arpit Bansal University of Maryland bansal01@umd.edu Avi Schwarzschild University of Maryland avi1@umd.edu Eitan Borgnia University of Chicago eborgnia@uchicago.edu Zeyad Emam University of Maryland Furong Huang University of Maryland Micah Goldblum New York University Tom Goldstein University of Maryland
Pseudocode	Yes	Algorithm 1 Incremental Progress Training Algorithm Input: parameter vector , integer m, weight for batch_idx = 1, 2, ... do Choose n U{0, m 1} and k U{1, m n} Compute φn with n iterations w/o tracking gradients Compute ˆyprog with additional k iterations Compute ˆym with new forward pass of m iterations Compute Lmax_iters with ˆym and Lprogressive with ˆyprog. Compute L = (1 ) Lmax_iters + Lprogressive Compute r L and update end for
Open Source Code	Yes	code to reproduce all of our experiments is publicly available.3 3A Py Torch implementation of the code is available at github.com/aks2203/deep-thinking.
Open Datasets	Yes	We evaluate our methods on the benchmark problems available in the Python package easy-to-hard-data, which generates reasoning problems of various difﬁculties. The three problems considered are computing preﬁx sums, ﬁnding the optimal path in a two-dimensional maze, and solving chess puzzles. We brieﬂy review the input and output structures for each problem and refer the reader to Schwarzschild et al. [21] for more detail, including the data generation process.
Dataset Splits	Yes	All of our training is done on 32-bit strings and we explore the behavior of our models on longer strings, even showing strong performance on 512-bits." and "All training in our work is done on 9 9 mazes. Despite this small training size, we benchmark extrapolation performance on mazes of size 59 59, and observe models solving mazes as large as 201 201 with high success rates." and "Problems are sorted by difﬁculty and we train on the ﬁrst 600K easiest puzzles and test our models on indices ranging from 700K to 1.1M.
Hardware Specification	No	The paper mentions pushing systems 'to their limits (and the limits of our hardware)' but does not provide any specific details about the hardware used, such as GPU or CPU models.
Software Dependencies	No	A Py Torch implementation of the code is available at github.com/aks2203/deep-thinking. While PyTorch is mentioned, no specific version number is provided for PyTorch or any other software dependency.
Experiment Setup	Yes	We train our models on 32-bit strings with a maximum of only m = 30 recurrent iterations (indicated in the ﬁgures as the shaded train regime section), 400 convolutional ﬁlters per layer, and incremental loss weight = 1 when progressive loss is used." and "The best models in Figure 4 are DT-Recall models trained with 30 maximum iterations and a weighting in the loss of = 0.01." and "Our best models are DT-Recall networks with 512 convolutional ﬁlters in each layer trained with a maximum of 30 iterations and a weight of = 0.5." and "In our implementation, we randomly sample the number of iterations used to generate a partial solution, n, and the number of training iterations, k, budgeted for the network to improve this partial solution.