Deep Equilibrium Algorithmic Reasoning

Authors: Dobrik Georgiev, Joseph Wilson, Davide Buffelli, Pietro Lió

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evidence, leveraging algorithms from the CLRS-30 benchmark, validates that one can train a network to solve algorithmic problems by directly finding the equilibrium. 5 Evaluation Setup For each algorithm we generated 105/100/100-sized training/validation/test datasets.
Researcher Affiliation Collaboration Dobrik Georgiev University of Cambridge dgg30@cam.ac.uk JJ Wilson Independent Researcher josephjwilson74@gmail.com Davide Buffelli Media Tek Research davide.buffelli@mtkresearch.com Pietro Liò University of Cambridge pl219@cam.ac.uk
Pseudocode Yes Listing 1: BFS algorithm. Clearly implemented as while b do c.
Open Source Code Yes 0Source code available here: https://github.com/Hekpo Ma H/DEAR
Open Datasets Yes Our empirical evidence, leveraging algorithms from the CLRS-30 benchmark, validates that one can train a network to solve algorithmic problems by directly finding the equilibrium.
Dataset Splits Yes For each algorithm we generated 105/100/100-sized training/validation/test datasets. Training sample sizes vary between 8 and 16 elements (uniformly randomly chosen) validation samples are of size 16.
Hardware Specification Yes If run on a single 4090 GPU one would need about 3 weeks of total compute.
Software Dependencies Yes The final node embeddings, from which the output is decoded, are the solution to the equation: H( ) = PUE(H( )) (4) where PUE(H( )) = P(H( ), U, E), U/E are the encoded node and edge feature matrices and P is the processor function. H(t) are the stacked latent states of the nodes at timestep t (with H(0) = 0). The above Equation 4 matches the signature of Equation 2, and can be solved via root-finding (we employ torchdeq [19]; MIT License), as if it is fθ of a DEQ. Our differences are mostly required by software engineering rather than research, hence they live here. Differences are: Different DL framework (Pytorch [36])
Experiment Setup Yes In our experiments the models have a latent dimensionality of 128, the batch size is 32, the learning rate is 3 10 4 and we use the Adam optimizer [31]. We train our algorithmic reasoners for 100 epochs, choosing the model with the lowest task validation loss (discounting any regularisation; focusing on performance only) for testing.