Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rethinking Deep Thinking: Stable Learning of Algorithms using Lipschitz Constraints

Authors: Jay Bear, Adam Prugel-Bennett, Jonathon Hare

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark on the traveling salesperson problem to evaluate the capabilities of the modified system in an NP-hard problem where DT fails to learn. 5 Results on Easy-to-Hard Problems
Researcher Affiliation Academia Jay Bear Adam Prügel-Bennett Jonathon Hare The University of Southampton, Southampton, UK EMAIL
Pseudocode No The paper uses diagrams and formal equations to describe the architecture and process, but does not include a distinct pseudocode or algorithm block.
Open Source Code Yes code for the experiments can be found at https://github.com/Jay-Bear/ rethinking-deep-thinking.
Open Datasets Yes We test on the three problem classes used by Bansal et al. [2] to evaluate DT-R, namely a prefix sum problem, a maze problem and a chess problem. from the Easy To Hard dataset [17]
Dataset Splits Yes shuffled and split into 80% training samples, 20% validation samples.
Hardware Specification Yes We have trained and evaluated the models on a range of different Nvidia GPU accelerators from RTX2080Tis to A100s, as well as on M3-series Apple Silicon. RTX8000, A100
Software Dependencies No The paper mentions PyTorch [16] but does not specify its version. It also mentions Adam optimizer [12] but no version number is provided for it or any other software dependencies.
Experiment Setup Yes All models use the Adam optimizer [12] with a learning rate of 0.001, β1 = 0.9, β2 = 0.999, weight decay set to 0.0002 and only applied to unconstrained convolutional weights; incremental progress training with α = 0.5; exponential warmup with a warmup period of 3; a multi-step learning rate scheduler where milestones are calculated as a 8 : 4 : 2 : 1 ratio of the total number of epochs, with learning rates multiplied by 0.1 at each milestone.