Is Programming by Example Solved by LLMs?
Authors: Wen-Ding Li, Kevin Ellis
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on classic domains such as lists and strings, and an uncommon graphics programming domain not well represented in typical pretraining data. We find that pretrained models are not effective at PBE, but that they can be fine-tuned for much higher performance, provided the test problems are in-distribution. We analyze empirically what causes these models to succeed and fail, and take steps toward understanding how to achieve better out-of-distribution generalization. |
| Researcher Affiliation | Academia | Wen-Ding Li Cornell University wl678@cornell.edu Kevin Ellis Cornell University kellis@cornell.edu |
| Pseudocode | No | The paper describes an 'algorithm' for adaptation and provides mathematical equations to define its steps, but it does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block format typically seen in research papers. |
| Open Source Code | No | The paper mentions using and citing third-party open-source models and repositories (e.g., Deep Seek Coder, Jina embeddings, Fleet), but it does not provide a direct link or explicit statement about the open-source release of its own implementation code for the described methodology. |
| Open Datasets | Yes | For list functions we seed with 50 problems from Rule et al. 2024; For text editing, we consider seeding with either Sy Gu S or a 40-problem subset of PROSE; for LOGO we seed with 200 training-set problems in Wong et al. [43]. |
| Dataset Splits | No | The paper mentions 'held-out test set' and 'test problems' but does not explicitly describe a distinct 'validation set' or 'validation split' as part of its data partitioning for experiments. |
| Hardware Specification | Yes | All experiments were performed on single-node machines (8x A6000 or 8x A100, etc.) without a multi-node distributed computing setup. |
| Software Dependencies | Yes | We fine-tune a Deep Seek Coder LLM [44] that was pretrained on source code... Lo RA Finetuning Model Used deepseekcoder-1.5-7b-instruct... Lo RA 33b Model Used for FT deepseekcoder-33b-instruct... When filtering duplicate synthetic data, we employed an open code embedding model[69] available on Hugging Face. |
| Experiment Setup | Yes | We present the dataset generation and training parameters in Table. 2 and Table. 3. These tables include parameters such as 'Sampling Temperature', 'Lo RA Rank', 'Lo RA α', 'Learning Rate', 'LR Schedule', 'Warmup Steps', 'Epoch', and 'Batchsize'. |