How to transfer algorithmic reasoning knowledge to learn new algorithms?
Authors: Louis-Pascal Xhonneux, Andreea-Ioana Deac, Petar Veličković, Jian Tang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To investigate this empirically we create a dataset including 9 algorithms and 3 different graph types. We validate this empirically and show how instead multi-task learning can be used to achieve the transfer of algorithmic reasoning knowledge. |
| Researcher Affiliation | Collaboration | Louis-Pascal A. C. Xhonneux Université de Montréal Mila xhonneul@mila.quebec Andreea Deac Université de Montréal Mila deacandr@mila.quebec Petar Veliˇckovi c Deep Mind, London UK petarv@google.com Jian Tang HEC Montréal Mila jian.tang@hec.ca |
| Pseudocode | Yes | Algorithm 1 Parallel Input: graph G, weights w, source index i initialise_nodes(G.vertices, i) repeat for (u, v) G.edges() do relax_edge(u, v, w) end for until none of the nodes change |
| Open Source Code | No | All code to generate data and train models will be released upon acceptance with an MIT license. |
| Open Datasets | No | To investigate this empirically we create a dataset including 9 algorithms and 3 different graph types. |
| Dataset Splits | No | We train using ADAM [28] with a learning rate of 0.0005, a batch size of 64, and use early stopping with a patience of 10 to prevent overfitting. We test on graphs size 20, 50, and 100 nodes. |
| Hardware Specification | Yes | Each experiment was executed on a V100 GPU in less than 5 hours for the longest experiment. |
| Software Dependencies | No | The paper mentions using ADAM [28] as the optimizer but does not specify version numbers for any software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | For all experiments we use 5,000 graphs of each type (Erdos-Renyi (ER), Barabasi-Albert (BA), 2d-Grids (2d-G)) with 20 nodes each. We train using ADAM [28] with a learning rate of 0.0005, a batch size of 64, and use early stopping with a patience of 10 to prevent overfitting. We test on graphs size 20, 50, and 100 nodes. The hidden embedding size is set to 32 except for NE++ for multi-task experiments, where it is 16 to account for the additional expressivity of having several encoders. |