Distribution Learning with Valid Outputs Beyond the Worst-Case
Authors: Nicholas Rittler, Kamalika Chaudhuri
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Prior work on this problem takes a worst-case stance, showing that proper learning requires an exponential number of validity queries, and demonstrating an improper algorithm which while generating guarantees in a wide-range of settings makes an atypical polynomial number of validity queries. In this work, we take a first step towards characterizing regimes where guaranteeing validity is easier than in the worst-case. We show that when the data distribution lies in the model class and the log-loss is minimized, the number of samples required to ensure validity has a weak dependence on the validity requirement. Additionally, we show that when the validity region belongs to a VC-class, a limited number of validity queries are often sufficient. |
| Researcher Affiliation | Academia | Nick Rittler University of California San Diego nrittler@ucsd.edu Kamalika Chaudhuri University of California San Diego kamalika@cs.ucsd.edu |
| Pseudocode | Yes | Algorithm 1 Modifying ERM to Yield Log-Loss Guarantees, Algorithm 2 Post-Hoc Restriction of ERM to an Estimate of Valid Outputs, Algorithm 3 Restriction to ERM under Log-Loss without Validity Assumption |
| Open Source Code | No | The paper is theoretical and does not mention making any source code available. The NeurIPS checklist in the paper confirms this with "NA" for open access to data and code. |
| Open Datasets | No | The paper is theoretical and does not refer to any specific publicly available datasets for experimental training. The NeurIPS checklist in the paper states "NA" for experimental reproducibility. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits for validation. The NeurIPS checklist in the paper states "NA" for experimental reproducibility. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. The NeurIPS checklist in the paper states "NA" for experiments compute resources. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers. The NeurIPS checklist in the paper states "NA" for experiments. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details such as hyperparameters or training configurations. The NeurIPS checklist in the paper states "NA" for experiments. |