Is Programming by Example Solved by LLMs?

Authors: Wen-Ding Li, Kevin Ellis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment on classic domains such as lists and strings, and an uncommon graphics programming domain not well represented in typical pretraining data. We find that pretrained models are not effective at PBE, but that they can be fine-tuned for much higher performance, provided the test problems are in-distribution. We analyze empirically what causes these models to succeed and fail, and take steps toward understanding how to achieve better out-of-distribution generalization.
Researcher Affiliation Academia Wen-Ding Li Cornell University wl678@cornell.edu Kevin Ellis Cornell University kellis@cornell.edu
Pseudocode No The paper describes an 'algorithm' for adaptation and provides mathematical equations to define its steps, but it does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block format typically seen in research papers.
Open Source Code No The paper mentions using and citing third-party open-source models and repositories (e.g., Deep Seek Coder, Jina embeddings, Fleet), but it does not provide a direct link or explicit statement about the open-source release of its own implementation code for the described methodology.
Open Datasets Yes For list functions we seed with 50 problems from Rule et al. 2024; For text editing, we consider seeding with either Sy Gu S or a 40-problem subset of PROSE; for LOGO we seed with 200 training-set problems in Wong et al. [43].
Dataset Splits No The paper mentions 'held-out test set' and 'test problems' but does not explicitly describe a distinct 'validation set' or 'validation split' as part of its data partitioning for experiments.
Hardware Specification Yes All experiments were performed on single-node machines (8x A6000 or 8x A100, etc.) without a multi-node distributed computing setup.
Software Dependencies Yes We fine-tune a Deep Seek Coder LLM [44] that was pretrained on source code... Lo RA Finetuning Model Used deepseekcoder-1.5-7b-instruct... Lo RA 33b Model Used for FT deepseekcoder-33b-instruct... When filtering duplicate synthetic data, we employed an open code embedding model[69] available on Hugging Face.
Experiment Setup Yes We present the dataset generation and training parameters in Table. 2 and Table. 3. These tables include parameters such as 'Sampling Temperature', 'Lo RA Rank', 'Lo RA α', 'Learning Rate', 'LR Schedule', 'Warmup Steps', 'Epoch', and 'Batchsize'.