reproducibilityindex.ai

Incentivizing Quality Text Generation via Statistical Contracts

Authors: Eden Saig, Ohad Einav, Inbal Talgam-Cohen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our framework empirically by deriving contracts for a range of objectives and LLM evaluation benchmarks, and find that cost-robust contracts sacrifice only a marginal increase in objective value compared to their cost-aware counterparts. We evaluate the empirical performance of our cost-robust contracts using LLM evaluation benchmarks.
Researcher Affiliation	Academia	Eden Saig1, Ohad Einav1, Inbal Talgam-Cohen1,2 1 Technion Israel Institute of Technology 2 Tel Aviv University {edens,ohadeinav,italgam}@cs.technion.ac.il
Pseudocode	No	The paper provides Linear Programs (LPs) and Quadratic Programs (QPs) in Appendix D.1, which formally define optimization problems but do not present step-by-step pseudocode or algorithm blocks.
Open Source Code	Yes	Implementation details are provided in Appendix E.3, and code is available at: https://github.com/edensaig/llm-contracts.
Open Datasets	Yes	We use the LLM task of code-generation which has m = 2 outcomes: pass or fail. ... Datasets. We use evaluation data from two distinct benchmarks... The Mostly Basic Programming Problems (MBPP) benchmark [4]... The Human Eval benchmark [11]... The MT-Bench benchmark [41] is designed to evaluate the conversational and instructionfollowing abilities of LLMs in multi-turn (MT) conversational settings. ... We use energy data from the Hugging Face LLM Performance Leaderboard5 [22, 23]...
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits, percentages, or sample counts.
Hardware Specification	Yes	Hardware. All experiments were run on a single Macbook Pro laptop, with 16GB of RAM, and M2 processor, and with no GPU support.
Software Dependencies	No	Our code relies on cvxpy [12, 2] and Clarabel [16] for solving linear and quadratic programs. The specific version numbers for these software dependencies are not explicitly provided in the text.
Experiment Setup	No	The paper describes the general approach to solving optimization problems and the types of contracts implemented, but it does not specify concrete hyperparameter values or detailed system-level training settings such as learning rates, batch sizes, or optimizer configurations.