Explaining Datasets in Words: Statistical Models with Natural Language Parameters

Authors: Ruiqi Zhong, Heng Wang, Dan Klein, Jacob Steinhardt

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we benchmark our algorithm from Section 4; we later apply it to open-ended applications in Section 6. We run our algorithm on datasets where we know the ground truth predicates ϕ and evaluate whether it can recover them.
Researcher Affiliation Academia ruiqi-zhong@berkeley.edu, corresponding author. All authors affiliated with UC Berkeley.
Pseudocode Yes Algorithm 1 A formal description of our algorithm.
Open Source Code Yes Our code and dataset are at https://github.com/ruiqi-zhong/nlparam
Open Datasets Yes Clustering. We consider five datasets, AGNews, DBPedia, NYT, Bills, and Wiki [40, 58, 32, 23].
Dataset Splits No No explicit training/test/validation dataset splits (e.g., specific percentages or sample counts for each split) are provided in the main text. The paper mentions overall dataset sizes like '2048 examples from each' and '5,000 articles'.
Hardware Specification Yes All of the experiments ran in Section 5 are estimed to cost at most 200 GPU hours on an A100 GPU with 40GB memory
Software Dependencies Yes When running the algorithm, we generate candidate predicates in Discretize with gpt-3.5-turbo [37]; to perform the denotation operation JϕK(x), we use flan-t5-xl [13]; we create the embedding for each sample x with the Instructor-xl model [48] and then normalize it with ℓ2 norm. We use gpt-4o [36] to discretize and claude-3.5-sonnet [1] to compute denotations.
Experiment Setup Yes We set the number of candidates M returned by Discretize to be 5 and the number of optimization iteration S to be 10. To reduce noises due to randomness, we average the performance of five random seeds for each experiment.