Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Authors: Ruiqi Zhong, Heng Wang, Dan Klein, Jacob Steinhardt
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we benchmark our algorithm from Section 4; we later apply it to open-ended applications in Section 6. We run our algorithm on datasets where we know the ground truth predicates ϕ and evaluate whether it can recover them. |
| Researcher Affiliation | Academia | ruiqi-zhong@berkeley.edu, corresponding author. All authors affiliated with UC Berkeley. |
| Pseudocode | Yes | Algorithm 1 A formal description of our algorithm. |
| Open Source Code | Yes | Our code and dataset are at https://github.com/ruiqi-zhong/nlparam |
| Open Datasets | Yes | Clustering. We consider five datasets, AGNews, DBPedia, NYT, Bills, and Wiki [40, 58, 32, 23]. |
| Dataset Splits | No | No explicit training/test/validation dataset splits (e.g., specific percentages or sample counts for each split) are provided in the main text. The paper mentions overall dataset sizes like '2048 examples from each' and '5,000 articles'. |
| Hardware Specification | Yes | All of the experiments ran in Section 5 are estimed to cost at most 200 GPU hours on an A100 GPU with 40GB memory |
| Software Dependencies | Yes | When running the algorithm, we generate candidate predicates in Discretize with gpt-3.5-turbo [37]; to perform the denotation operation JϕK(x), we use flan-t5-xl [13]; we create the embedding for each sample x with the Instructor-xl model [48] and then normalize it with ℓ2 norm. We use gpt-4o [36] to discretize and claude-3.5-sonnet [1] to compute denotations. |
| Experiment Setup | Yes | We set the number of candidates M returned by Discretize to be 5 and the number of optimization iteration S to be 10. To reduce noises due to randomness, we average the performance of five random seeds for each experiment. |