Can Large Language Models Reason about Program Invariants?
Authors: Kexin Pei, David Bieber, Kensen Shi, Charles Sutton, Pengcheng Yin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our proposed invariant generation methods, we perform a series of experiments on programs obtained from competitive programming contests (Section 4.1). Our primary baseline is Daikon, which generates the ground-truth for training and evaluation by executing the programs on hundreds of possible inputs. ... We fine-tune LMs using an initial learning rate of 0.001, and a cosine learning rate decay schedule for 20,000 steps on 64 TPU v4 cores. The batch size is 128. ... We report Jaccard similarity, precision, recall, and F1 score at level of invariants. |
| Researcher Affiliation | Collaboration | 1Columbia University 2Google Research, Brain Team. Correspondence to: Kexin Pei <kpei@cs.columbia.edu>, David Bieber <dbieber@google.com>, Charles Sutton <charlessutton@google.com>. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an unambiguous statement or a direct link to the open-source code for the methodology described. |
| Open Datasets | Yes | We evaluate our models on the Java submissions in the Code Contests dataset (Li et al., 2022), which consists of millions of submissions to about four thousand distinct programming challenges; the dataset provides upwards of 200 inputs for each problem. |
| Dataset Splits | Yes | In total, the resulting Code Contests Java Invariants dataset includes 1,600,158 training, 86,346 validation, and 24,509 test examples. |
| Hardware Specification | Yes | We fine-tune LMs using an initial learning rate of 0.001, and a cosine learning rate decay schedule for 20,000 steps on 64 TPU v4 cores. |
| Software Dependencies | No | The paper mentions tools like 'Daikon' but does not provide specific version numbers for any software dependencies or libraries required for replication. |
| Experiment Setup | Yes | We fine-tune LMs using an initial learning rate of 0.001, and a cosine learning rate decay schedule for 20,000 steps on 64 TPU v4 cores. The batch size is 128. |