Incentivizing Quality Text Generation via Statistical Contracts

Authors: Eden Saig, Ohad Einav, Inbal Talgam-Cohen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our framework empirically by deriving contracts for a range of objectives and LLM evaluation benchmarks, and find that cost-robust contracts sacrifice only a marginal increase in objective value compared to their cost-aware counterparts. We evaluate the empirical performance of our cost-robust contracts using LLM evaluation benchmarks.
Researcher Affiliation Academia Eden Saig1, Ohad Einav1, Inbal Talgam-Cohen1,2 1 Technion Israel Institute of Technology 2 Tel Aviv University {edens,ohadeinav,italgam}@cs.technion.ac.il
Pseudocode No The paper provides Linear Programs (LPs) and Quadratic Programs (QPs) in Appendix D.1, which formally define optimization problems but do not present step-by-step pseudocode or algorithm blocks.
Open Source Code Yes Implementation details are provided in Appendix E.3, and code is available at: https://github.com/edensaig/llm-contracts.
Open Datasets Yes We use the LLM task of code-generation which has m = 2 outcomes: pass or fail. ... Datasets. We use evaluation data from two distinct benchmarks... The Mostly Basic Programming Problems (MBPP) benchmark [4]... The Human Eval benchmark [11]... The MT-Bench benchmark [41] is designed to evaluate the conversational and instructionfollowing abilities of LLMs in multi-turn (MT) conversational settings. ... We use energy data from the Hugging Face LLM Performance Leaderboard5 [22, 23]...
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, percentages, or sample counts.
Hardware Specification Yes Hardware. All experiments were run on a single Macbook Pro laptop, with 16GB of RAM, and M2 processor, and with no GPU support.
Software Dependencies No Our code relies on cvxpy [12, 2] and Clarabel [16] for solving linear and quadratic programs. The specific version numbers for these software dependencies are not explicitly provided in the text.
Experiment Setup No The paper describes the general approach to solving optimization problems and the types of contracts implemented, but it does not specify concrete hyperparameter values or detailed system-level training settings such as learning rates, batch sizes, or optimizer configurations.