Learning to Complete Code with Sketches

Authors: Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, Miltiadis Allamanis

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, GRAMMFORMER generates 10-50% more accurate completions compared to traditional generative models and 37-50% longer sketches compared to sketch-generating baselines trained with similar techniques.
Researcher Affiliation Collaboration Daya Guo School of Computer Science and Engineering Sun Yat-sen University, China guody5@mail2.sysu.edu.cn Alexey Svyatkovskiy Microsoft Redmond, WA, USA alsvyatk@microsoft.com Jian Yin School of Computer Science and Engineering Sun Yat-sen University, China issjyin@mail.sysu.edu.cn Nan Duan Microsoft Research Beijing, China nanduan@microsoft.com Marc Brockschmidt, Miltiadis Allamanis Microsoft Research Cambridge, UK {mabrocks,miallama}@microsoft.com
Pseudocode Yes Algorithm 1 GRAMMFORMER generative process, given an input sequence x(0).
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No To collect a dataset, we clone all non-fork repositories with more than 20 stars on Git Hub that have C# or Python as their top language.
Dataset Splits Yes Finally, we split the files into 70-10-20 train-validation-test.
Hardware Specification Yes Training used 64 NVIDIA Tesla P100 with 16GB memory for 10 days.
Software Dependencies Yes Finally, we parse all files into a syntax tree using Treesitter, ignoring any files that cannot be parsed using the v0.19.0 grammar definitions.
Experiment Setup Yes Most of our models use a 6-layer Transformer as encoder and 6-layer Transformer as decoder, each with a hidden dimension of 768 and 12 attention heads, with the exception of the LM model (and its variations), which uses a single 12-layer Transformer, to match the number of parameters of the other models. We set the intermediate dimension of each Transformer layer as 3072... We set max length of input and output sequences as 512 and 64, respectively. We train the model with Adam optimiser using a learning rate of 2e-5 and batch size 4 096.