Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deep Equilibrium Algorithmic Reasoning
Authors: Dobrik Georgiev, Joseph Wilson, Davide Buffelli, Pietro Lió
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evidence, leveraging algorithms from the CLRS-30 benchmark, validates that one can train a network to solve algorithmic problems by directly finding the equilibrium. 5 Evaluation Setup For each algorithm we generated 105/100/100-sized training/validation/test datasets. |
| Researcher Affiliation | Collaboration | Dobrik Georgiev University of Cambridge EMAIL JJ Wilson Independent Researcher EMAIL Davide Buffelli Media Tek Research EMAIL Pietro Liò University of Cambridge EMAIL |
| Pseudocode | Yes | Listing 1: BFS algorithm. Clearly implemented as while b do c. |
| Open Source Code | Yes | 0Source code available here: https://github.com/Hekpo Ma H/DEAR |
| Open Datasets | Yes | Our empirical evidence, leveraging algorithms from the CLRS-30 benchmark, validates that one can train a network to solve algorithmic problems by directly finding the equilibrium. |
| Dataset Splits | Yes | For each algorithm we generated 105/100/100-sized training/validation/test datasets. Training sample sizes vary between 8 and 16 elements (uniformly randomly chosen) validation samples are of size 16. |
| Hardware Specification | Yes | If run on a single 4090 GPU one would need about 3 weeks of total compute. |
| Software Dependencies | Yes | The final node embeddings, from which the output is decoded, are the solution to the equation: H( ) = PUE(H( )) (4) where PUE(H( )) = P(H( ), U, E), U/E are the encoded node and edge feature matrices and P is the processor function. H(t) are the stacked latent states of the nodes at timestep t (with H(0) = 0). The above Equation 4 matches the signature of Equation 2, and can be solved via root-finding (we employ torchdeq [19]; MIT License), as if it is fθ of a DEQ. Our differences are mostly required by software engineering rather than research, hence they live here. Differences are: Different DL framework (Pytorch [36]) |
| Experiment Setup | Yes | In our experiments the models have a latent dimensionality of 128, the batch size is 32, the learning rate is 3 10 4 and we use the Adam optimizer [31]. We train our algorithmic reasoners for 100 epochs, choosing the model with the lowest task validation loss (discounting any regularisation; focusing on performance only) for testing. |