reproducibilityindex.ai

Is Programming by Example Solved by LLMs?

Authors: Wen-Ding Li, Kevin Ellis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment on classic domains such as lists and strings, and an uncommon graphics programming domain not well represented in typical pretraining data. We find that pretrained models are not effective at PBE, but that they can be fine-tuned for much higher performance, provided the test problems are in-distribution. We analyze empirically what causes these models to succeed and fail, and take steps toward understanding how to achieve better out-of-distribution generalization.
Researcher Affiliation	Academia	Wen-Ding Li Cornell University wl678@cornell.edu Kevin Ellis Cornell University kellis@cornell.edu
Pseudocode	No	The paper describes an 'algorithm' for adaptation and provides mathematical equations to define its steps, but it does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block format typically seen in research papers.
Open Source Code	No	The paper mentions using and citing third-party open-source models and repositories (e.g., Deep Seek Coder, Jina embeddings, Fleet), but it does not provide a direct link or explicit statement about the open-source release of its own implementation code for the described methodology.
Open Datasets	Yes	For list functions we seed with 50 problems from Rule et al. 2024; For text editing, we consider seeding with either Sy Gu S or a 40-problem subset of PROSE; for LOGO we seed with 200 training-set problems in Wong et al. [43].
Dataset Splits	No	The paper mentions 'held-out test set' and 'test problems' but does not explicitly describe a distinct 'validation set' or 'validation split' as part of its data partitioning for experiments.
Hardware Specification	Yes	All experiments were performed on single-node machines (8x A6000 or 8x A100, etc.) without a multi-node distributed computing setup.
Software Dependencies	Yes	We fine-tune a Deep Seek Coder LLM [44] that was pretrained on source code... Lo RA Finetuning Model Used deepseekcoder-1.5-7b-instruct... Lo RA 33b Model Used for FT deepseekcoder-33b-instruct... When filtering duplicate synthetic data, we employed an open code embedding model[69] available on Hugging Face.
Experiment Setup	Yes	We present the dataset generation and training parameters in Table. 2 and Table. 3. These tables include parameters such as 'Sampling Temperature', 'Lo RA Rank', 'Lo RA α', 'Learning Rate', 'LR Schedule', 'Warmup Steps', 'Epoch', and 'Batchsize'.