Learning to Represent Programs with Graphs

Authors: Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our models on a large dataset of 2.9 million lines of real-world source code, showing that our best model achieves 32.9% accuracy on the VARNAMING task and 85.5% accuracy on the VARMISUSE task, beating simpler baselines (cf. section 5).
Researcher Affiliation Collaboration Miltiadis Allamanis Microsoft Research Cambridge, UK miallama@microsoft.com Marc Brockschmidt Microsoft Research Cambridge, UK mabrocks@microsoft.com Mahmoud Khademi Simon Fraser University Burnaby, BC, Canada mkhademi@sfu.ca
Pseudocode No The paper describes the methods for transforming source code into program graphs and the Gated Graph Neural Network (GGNN) model in detailed text, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our implementation of graph neural networks (on a simpler task) can be found at https://github.com/Microsoft/gated-graph-neural-network-samples and the dataset can be found at https://aka.ms/iclr18-prog-graphs-dataset. [...] Our (generic) implementation of GGNNs is available at https://github.com/Microsoft/gated-graph-neural-network-samples, using a simpler demonstration task.
Open Datasets Yes Our implementation of graph neural networks (on a simpler task) can be found at https://github.com/Microsoft/gated-graph-neural-network-samples and the dataset can be found at https://aka.ms/iclr18-prog-graphs-dataset.
Dataset Splits Yes We split the remaining 23 projects into train/validation/test sets in the proportion 60-10-30, splitting along files (i.e., all examples from one source file are in the same set).
Hardware Specification Yes Our Tensor Flow (Abadi et al., 2016) implementation scales to 55 graphs per second during training and 219 graphs per second during test-time using a single NVidia Ge Force GTX Titan X with graphs having on average 2,228 (median 936) nodes and 8,350 (median 3,274) edges and 8 GGNN unrolling iterations, all 20 edge types (forward and backward edges for 10 original edge types) and the size of the hidden layer set to 64.
Software Dependencies No The paper mentions using 'TensorFlow (Abadi et al., 2016)' but does not specify a version number for TensorFlow or any other software dependencies, such as Python or specific libraries.
Experiment Setup Yes Using the initial node representations, concatenated with an extra bit that is set to one for the candidate nodes vt,v, we run GGNN propagation for 8 time steps. [...] the size of the hidden layer set to 64. [...] We train using a max-margin objective.