Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Selective Generation for Controllable Language Models

Authors: Minjae Lee, Kyungmin Kim, Taesoo Kim, Sangdon Park

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the efficacy of the SGen family in achieving a desired FDR-E level with comparable selection efficiency to those from baselines on both open and closed source GLMs.
Researcher Affiliation Academia Minjae Lee GSAI POSTECH EMAIL Kyungmin Kim GSAI POSTECH EMAIL Taesoo Kim SCS & SCP Ga Tech EMAIL Sangdon Park GSAI & CSE POSTECH EMAIL
Pseudocode Yes Algorithm 1 Entailment Set Learning with a False Entailment Rate (FER) Guarantee
Open Source Code Yes Code and datasets are provided at https://github.com/ml-postech/selective-generation.
Open Datasets Yes We use two GLMs, GPT-3.5-Turbo and Alpaca-7B, alongside the Natural Questions (NQ) dataset to annotate entailment labels for question-answer pairs. [...] we create a dataset on textual entailment using the Natural Questions (NQ) dataset [17] for each GLM.
Dataset Splits Yes Approximately 7.3k (7,374) and 4.6k (4,595) samples are labeled for Alpaca-7B and GPT-3.5-Turbo, respectively, and both are split into calibration and test data at an 8:2 ratio.
Hardware Specification Yes Our system environment consists of 4 NVIDIA A100 80GB with 128 CPUs.
Software Dependencies No The paper mentions models like 'GPT-3.5-Turbo and Alpaca-7B' and 'deberta-v2-xxlarge-mnli' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes To control an FDR-E, we use two user-specified parameters (ε, δ), where we use (0.25, 0.02) unless specified. For our methods (i.e., SGen Semi, SGen Semi No MS, and SGen Semi-Sup No MS ), we have five control parameters (εS, δS, δE, δW ), where we maps as follows: εS = ε, δS = (δ δW )/2, δE = (δ δW )/2, δW = 10 5.