Generative Code Modeling with Graphs
Authors: Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, Oleksandr Polozov
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An experimental evaluation shows that our new model can generate semantically meaningful expressions, outperforming a range of strong baselines. |
| Researcher Affiliation | Industry | Marc Brockschmidt, Miltiadis Allamanis, Alexander Gaunt Microsoft Research Cambridge, UK {mabrocks,miallama,algaunt}@microsoft.com Oleksandr Polozov Microsoft Research Redmond, WA, USA polozov@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Pseudocode for Expand Input: Context c, partial AST a, node v to expand. Algorithm 2 Pseudocode for Compute Edge Input: Partial AST a, node v |
| Open Source Code | Yes | We have released the code for this on https://github.com/Microsoft/graph-based-code-modelling. |
| Open Datasets | Yes | We have collected a dataset for our Expr Gen task from 593 highly-starred open-source C# projects on Git Hub, removing any near-duplicate files, following the work of Lopes et al. (2017). Samples from our dataset can be found in the supplementary material. |
| Dataset Splits | Yes | We split the data into four separate sets. A test-only dataset is made up from 100k samples generated from 114 projects. The remaining data we split into training-validation-test sets (3 : 1 : 1), keeping all expressions collected from a single source file within a single fold. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like GRU, GGNN, and the C# compiler Roslyn, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | The paper describes the training objective as "maximum likelihood objective without pre-trained components" and mentions "beam search decoding with beam width 5", but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed optimizer settings needed for reproduction. |