Incentivizing Quality Text Generation via Statistical Contracts
Authors: Eden Saig, Ohad Einav, Inbal Talgam-Cohen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our framework empirically by deriving contracts for a range of objectives and LLM evaluation benchmarks, and find that cost-robust contracts sacrifice only a marginal increase in objective value compared to their cost-aware counterparts. We evaluate the empirical performance of our cost-robust contracts using LLM evaluation benchmarks. |
| Researcher Affiliation | Academia | Eden Saig1, Ohad Einav1, Inbal Talgam-Cohen1,2 1 Technion Israel Institute of Technology 2 Tel Aviv University {edens,ohadeinav,italgam}@cs.technion.ac.il |
| Pseudocode | No | The paper provides Linear Programs (LPs) and Quadratic Programs (QPs) in Appendix D.1, which formally define optimization problems but do not present step-by-step pseudocode or algorithm blocks. |
| Open Source Code | Yes | Implementation details are provided in Appendix E.3, and code is available at: https://github.com/edensaig/llm-contracts. |
| Open Datasets | Yes | We use the LLM task of code-generation which has m = 2 outcomes: pass or fail. ... Datasets. We use evaluation data from two distinct benchmarks... The Mostly Basic Programming Problems (MBPP) benchmark [4]... The Human Eval benchmark [11]... The MT-Bench benchmark [41] is designed to evaluate the conversational and instructionfollowing abilities of LLMs in multi-turn (MT) conversational settings. ... We use energy data from the Hugging Face LLM Performance Leaderboard5 [22, 23]... |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, percentages, or sample counts. |
| Hardware Specification | Yes | Hardware. All experiments were run on a single Macbook Pro laptop, with 16GB of RAM, and M2 processor, and with no GPU support. |
| Software Dependencies | No | Our code relies on cvxpy [12, 2] and Clarabel [16] for solving linear and quadratic programs. The specific version numbers for these software dependencies are not explicitly provided in the text. |
| Experiment Setup | No | The paper describes the general approach to solving optimization problems and the types of contracts implemented, but it does not specify concrete hyperparameter values or detailed system-level training settings such as learning rates, batch sizes, or optimizer configurations. |