Repoformer: Selective Retrieval for Repository-Level Code Completion

Authors: Di Wu, Wasi Uddin Ahmad, Dejiao Zhang, Murali Krishna Ramanathan, Xiaofei Ma

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using this LM as both the selective RAG policy and the generation model, our framework achieves stateof-the-art repository-level code completion performance on diverse benchmarks including Repo Eval, Cross Code Eval, and Cross Code Long Eval, a new long-form code completion benchmark.
Researcher Affiliation Collaboration 1University of California Los Angeles 2AWS AI Labs.
Pseudocode Yes The full algorithms are presented in Appendix D. Algorithm 1 REPOFORMER Training Data Creation (Chunk Completion), Algorithm 2 REPOFORMER Training Data Creation (Function Completion)
Open Source Code No To facilitate future research on repository-level code completion, we will release our implementation and the Cross Code Long Eval benchmark at https://repoformer.github.io/.
Open Datasets Yes We leverage large-scale permissively licensed repositories from the Stack (Kocetkov et al., 2022) and create the fine-tuning data via a three-step procedure: [...] We perform comprehensive evaluations on a range of repository-level code completion tasks from Repo Eval (Zhang et al., 2023), Cross Code Eval (Ding et al., 2023), and Cross Code Long Eval a new large-scale benchmark focusing on code chunk and function completion.
Dataset Splits Yes We reserve 500 repositories for validation and use the rest for training.
Hardware Specification Yes The models are trained for 2 epochs, which approximately takes 8, 12, 20, and 50 hours for the 1B/3B/7B/16B models respectively with 8 Nvidia A100 GPUs (40G memory). Using this latency model, we benchmark the latency of various selective retrieval settings on Repo Eval with the vllm library (Kwon et al., 2023) on a single Nvidia A100 GPU (80G).
Software Dependencies No The paper mentions 'vllm library (Kwon et al., 2023)' and 'tree-sitter' but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes We fine-tune the 1B, 3B, 7B, and 16B variants of Star Coder Base with λ = 1.0, maximum sequence length 2048, learning rate 2e-5, batch size 512, 50 warmup steps, and a linear learning rate decay. The models are trained for 2 epochs