How to transfer algorithmic reasoning knowledge to learn new algorithms?

Authors: Louis-Pascal Xhonneux, Andreea-Ioana Deac, Petar Veličković, Jian Tang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To investigate this empirically we create a dataset including 9 algorithms and 3 different graph types. We validate this empirically and show how instead multi-task learning can be used to achieve the transfer of algorithmic reasoning knowledge.
Researcher Affiliation Collaboration Louis-Pascal A. C. Xhonneux Université de Montréal Mila xhonneul@mila.quebec Andreea Deac Université de Montréal Mila deacandr@mila.quebec Petar Veliˇckovi c Deep Mind, London UK petarv@google.com Jian Tang HEC Montréal Mila jian.tang@hec.ca
Pseudocode Yes Algorithm 1 Parallel Input: graph G, weights w, source index i initialise_nodes(G.vertices, i) repeat for (u, v) G.edges() do relax_edge(u, v, w) end for until none of the nodes change
Open Source Code No All code to generate data and train models will be released upon acceptance with an MIT license.
Open Datasets No To investigate this empirically we create a dataset including 9 algorithms and 3 different graph types.
Dataset Splits No We train using ADAM [28] with a learning rate of 0.0005, a batch size of 64, and use early stopping with a patience of 10 to prevent overfitting. We test on graphs size 20, 50, and 100 nodes.
Hardware Specification Yes Each experiment was executed on a V100 GPU in less than 5 hours for the longest experiment.
Software Dependencies No The paper mentions using ADAM [28] as the optimizer but does not specify version numbers for any software dependencies like programming languages or libraries.
Experiment Setup Yes For all experiments we use 5,000 graphs of each type (Erdos-Renyi (ER), Barabasi-Albert (BA), 2d-Grids (2d-G)) with 20 nodes each. We train using ADAM [28] with a learning rate of 0.0005, a batch size of 64, and use early stopping with a patience of 10 to prevent overfitting. We test on graphs size 20, 50, and 100 nodes. The hidden embedding size is set to 32 except for NE++ for multi-task experiments, where it is 16 to account for the additional expressivity of having several encoders.