reproducibilityindex.ai

Pragmatic Code Autocomplete

Authors: Gabriel Poesia, Noah Goodman445-452

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our system first proposes a set of strings that can be abbreviated by the user. Using only 100 abbreviations, we observe that a corpus of Python code can be compressed by 15%, a number that can be improved even further by specializing the abbreviations to a particular code base. We then use a contextualized sequence-to-sequence model to rank potential expansions of inputs that include abbreviations. In an offline reconstruction task our model achieves accuracies ranging from 93% to 99%, depending on the programming language and user settings. The model is small enough to run on a commodity CPU in real-time. We evaluate the usability of our system in a user study, integrating it in Microsoft VSCode, a popular code text editor. We observe that our system performs well and is complementary to traditional autocomplete features.
Researcher Affiliation	Academia	Gabriel Poesia, Noah Goodman Department of Computer Science, Stanford University, Stanford, CA 94305 {poesia,ndgoodman}@stanford.edu
Pseudocode	Yes	Function Find Abbreviatable Set(D, n) F[ ] 0 foreach l D do foreach t tokenize(l) do F[t] F[t] + 1 end end T F.keys().sort(λt . (\|t\| 1) F[t]) return T [1...n] Algorithm 1: Finds a list of n tokens that maximize compression on a dataset D of lines of code, when those tokens are abbreviated to a single character. Function Cap Collisions(T , c) T [ ] for i 1 to \|T \| do if T .count(λt . t1 = T [i]1) < c then T .append(T [i]) end return T Algorithm 2: Removes elements from a list of tokens T so that the number of potential expansions for any abbreviation is limited by c. This allows users to smoothly trade-off compression for higher accuracy.
Open Source Code	Yes	Code is available at https://github.com/gpoesia/magicomplete
Open Datasets	Yes	To train and evaluate our models, we collected a dataset of code from open source repositories in Python, Java and Java Script, the 3 programming languages that were ranked as the most popular in the Stack Overflow Developer Survey 20204. We used the Github API to collect a list of 7 million public repositories.
Dataset Splits	Yes	We split these files into training/validation/test sets using a standard 80%/10%/10% split.
Hardware Specification	Yes	We ran all experiments on a server with an NVIDIA TITAN Xp GPU with 12GB of memory.
Software Dependencies	Yes	We implemented our algorithms and models using the PyTorch 1.6.0 library (Paszke et al. 2019).
Experiment Setup	Yes	Architecture details and parameters can be found in the Appendix. The neural model we described in Section 4 uses an LSTM with 512 hidden units for both the bidirectional and the decoder, and 50-dimensional character embeddings. We embed context into a 128-dimensional vector. All models were trained for 10 epochs, using the Adam optimizer with a learning rate of 0.0005 and β = (0.9, 0.999), and a batch size of 512. The learning rate was set by doing simple search over 10 i and 5 10 i for i {1, 2, 3, 4, 5}, using validation accuracy to pick the value we used in later experiments.