LongCoder: A Long-Range Pre-trained Language Model for Code Completion

Authors: Daya Guo, Canwen Xu, Nan Duan, Jian Yin, Julian Mcauley

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a newly constructed dataset that contains longer code context and the publicly available Code XGLUE benchmark. Experimental results demonstrate that Long Coder achieves superior performance on code completion tasks compared to previous models while maintaining comparable efficiency in terms of computational resources during inference.
Researcher Affiliation Collaboration 1Sun Yat-sen University 2University of California, San Diego 3Microsoft Research Asia.
Pseudocode No The paper describes the model architecture and attention mechanisms using mathematical equations but does not present pseudocode or a clearly labeled algorithm block.
Open Source Code Yes All the codes and data are available at https://github. com/microsoft/Code BERT.
Open Datasets Yes To evaluate the effectiveness of Long Coder and encourage future research on Long Code Completion, we construct a new dataset called LCC... Specifically, we construct our datasets from the github-code2 dataset, which contains a vast number of code files sourced from Git Hub with an open-source license that permits research use. (https://huggingface.co/datasets/codeparrot/github-code)
Dataset Splits Yes For each programming language, we sample 100k examples for training, and 10k examples for development and 10k for testing.
Hardware Specification Yes The inference memory consumption and runtime per example are calculated using a beam search with beam size of 5 and maximum generation length of 64 on a single V100 GPU.
Software Dependencies No The paper mentions 'Adam optimizer' and 'tree-sitter' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes During finetuning, we use the Adam optimizer with a batch size of 16 and a learning rate of 2e-4. We fine-tune the model for 10 epochs and perform early stopping on the development set.