reproducibilityindex.ai

Repository-Level Prompt Generation for Large Language Models of Code

Authors: Disha Shrivastava, Hugo Larochelle, Daniel Tarlow

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on the task of single-line code auto-completion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines.
Researcher Affiliation	Collaboration	Disha Shrivastava 1 2 Hugo Larochelle 1 2 3 4 Daniel Tarlow 1 5 3 1Mila 2Université de Montréal 3Google 4CIFAR Associate Fellow 5Mc Gill University.
Pseudocode	No	The paper describes the methods in prose but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We release our code, data, and trained checkpoints at https: //github.com/shrivastavadisha/ repo_level_prompt_generation.
Open Datasets	Yes	Instead, we scraped Google Code https: //code.google.com/archive/ for repositories in Java (removing the ones that matched with a repository on Git Hub with the same name).
Dataset Splits	Yes	We divided the repositories into train, validation, and test splits, where each repository in its entirety is part of a split. ... # Repositories 19 14 14 47 # Files 2655 1060 1308 4757 # Holes 92721 48548 48288 189557
Hardware Specification	Yes	The computational complexity of training our larger RLPG-R variant (3.6M parameters, 141269 holes, and 9.19 minutes per epoch on a single Tesla V100 GPU) is much smaller than finetuning all or some part of Codex (175B parameters). ... Besides training the PPC, all our experiments were performed on a CPU with 8GB RAM.
Software Dependencies	Yes	We used the Open AI Codex Completions API for generating the predicted hole from the Codex model. In particular, we used the code-davinci-001 engine with the temperature set to 0.0 and stop criteria as a newline. ... We used the tree-sitter API for Java ... For the BM25-based baselines use the Okapi BM25 implementation with default parameters given by the pip package rank-bm25 0.2.2 10. ... We used Code BERT (Feng et al., 2020) as our pretrained model Fϕ...
Experiment Setup	Yes	We used the Open AI Codex Completions API for generating the predicted hole from the Codex model. In particular, we used the code-davinci-001 engine with the temperature set to 0.0 and stop criteria as a newline. The completion length was 24 and the maximum prompt length was 4072. ... We used Adam (Kingma & Ba, 2015) optimizer with a learning rate of 3e-4 and batch size of 64. ... A dropout value of 0.25 was used while training.