reproducibilityindex.ai

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

Authors: Ruiqi Zhong, Heng Wang, Dan Klein, Jacob Steinhardt

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we benchmark our algorithm from Section 4; we later apply it to open-ended applications in Section 6. We run our algorithm on datasets where we know the ground truth predicates ϕ and evaluate whether it can recover them.
Researcher Affiliation	Academia	ruiqi-zhong@berkeley.edu, corresponding author. All authors affiliated with UC Berkeley.
Pseudocode	Yes	Algorithm 1 A formal description of our algorithm.
Open Source Code	Yes	Our code and dataset are at https://github.com/ruiqi-zhong/nlparam
Open Datasets	Yes	Clustering. We consider five datasets, AGNews, DBPedia, NYT, Bills, and Wiki [40, 58, 32, 23].
Dataset Splits	No	No explicit training/test/validation dataset splits (e.g., specific percentages or sample counts for each split) are provided in the main text. The paper mentions overall dataset sizes like '2048 examples from each' and '5,000 articles'.
Hardware Specification	Yes	All of the experiments ran in Section 5 are estimed to cost at most 200 GPU hours on an A100 GPU with 40GB memory
Software Dependencies	Yes	When running the algorithm, we generate candidate predicates in Discretize with gpt-3.5-turbo [37]; to perform the denotation operation JϕK(x), we use flan-t5-xl [13]; we create the embedding for each sample x with the Instructor-xl model [48] and then normalize it with ℓ2 norm. We use gpt-4o [36] to discretize and claude-3.5-sonnet [1] to compute denotations.
Experiment Setup	Yes	We set the number of candidates M returned by Discretize to be 5 and the number of optimization iteration S to be 10. To reduce noises due to randomness, we average the performance of five random seeds for each experiment.