reproducibilityindex.ai

Selective Generation for Controllable Language Models

Authors: Minjae Lee, Kyungmin Kim, Taesoo Kim, Sangdon Park

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate the efficacy of the SGen family in achieving a desired FDR-E level with comparable selection efficiency to those from baselines on both open and closed source GLMs.
Researcher Affiliation	Academia	Minjae Lee GSAI POSTECH minjae.lee@postech.ac.kr Kyungmin Kim GSAI POSTECH kkm959595@postech.ac.kr Taesoo Kim SCS & SCP Ga Tech taesoo@gatech.edu Sangdon Park GSAI & CSE POSTECH sangdon@postech.ac.kr
Pseudocode	Yes	Algorithm 1 Entailment Set Learning with a False Entailment Rate (FER) Guarantee
Open Source Code	Yes	Code and datasets are provided at https://github.com/ml-postech/selective-generation.
Open Datasets	Yes	We use two GLMs, GPT-3.5-Turbo and Alpaca-7B, alongside the Natural Questions (NQ) dataset to annotate entailment labels for question-answer pairs. [...] we create a dataset on textual entailment using the Natural Questions (NQ) dataset [17] for each GLM.
Dataset Splits	Yes	Approximately 7.3k (7,374) and 4.6k (4,595) samples are labeled for Alpaca-7B and GPT-3.5-Turbo, respectively, and both are split into calibration and test data at an 8:2 ratio.
Hardware Specification	Yes	Our system environment consists of 4 NVIDIA A100 80GB with 128 CPUs.
Software Dependencies	No	The paper mentions models like 'GPT-3.5-Turbo and Alpaca-7B' and 'deberta-v2-xxlarge-mnli' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	To control an FDR-E, we use two user-specified parameters (ε, δ), where we use (0.25, 0.02) unless specified. For our methods (i.e., SGen Semi, SGen Semi No MS, and SGen Semi-Sup No MS ), we have five control parameters (εS, δS, δE, δW ), where we maps as follows: εS = ε, δS = (δ δW )/2, δE = (δ δW )/2, δW = 10 5.