Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Rethinking Deep Thinking: Stable Learning of Algorithms using Lipschitz Constraints
Authors: Jay Bear, Adam Prugel-Bennett, Jonathon Hare
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark on the traveling salesperson problem to evaluate the capabilities of the modified system in an NP-hard problem where DT fails to learn. 5 Results on Easy-to-Hard Problems |
| Researcher Affiliation | Academia | Jay Bear Adam Prügel-Bennett Jonathon Hare The University of Southampton, Southampton, UK EMAIL |
| Pseudocode | No | The paper uses diagrams and formal equations to describe the architecture and process, but does not include a distinct pseudocode or algorithm block. |
| Open Source Code | Yes | code for the experiments can be found at https://github.com/Jay-Bear/ rethinking-deep-thinking. |
| Open Datasets | Yes | We test on the three problem classes used by Bansal et al. [2] to evaluate DT-R, namely a prefix sum problem, a maze problem and a chess problem. from the Easy To Hard dataset [17] |
| Dataset Splits | Yes | shuffled and split into 80% training samples, 20% validation samples. |
| Hardware Specification | Yes | We have trained and evaluated the models on a range of different Nvidia GPU accelerators from RTX2080Tis to A100s, as well as on M3-series Apple Silicon. RTX8000, A100 |
| Software Dependencies | No | The paper mentions PyTorch [16] but does not specify its version. It also mentions Adam optimizer [12] but no version number is provided for it or any other software dependencies. |
| Experiment Setup | Yes | All models use the Adam optimizer [12] with a learning rate of 0.001, β1 = 0.9, β2 = 0.999, weight decay set to 0.0002 and only applied to unconstrained convolutional weights; incremental progress training with α = 0.5; exponential warmup with a warmup period of 3; a multi-step learning rate scheduler where milestones are calculated as a 8 : 4 : 2 : 1 ratio of the total number of epochs, with learning rates multiplied by 0.1 at each milestone. |