Distribution Learning with Valid Outputs Beyond the Worst-Case

Authors: Nicholas Rittler, Kamalika Chaudhuri

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Prior work on this problem takes a worst-case stance, showing that proper learning requires an exponential number of validity queries, and demonstrating an improper algorithm which while generating guarantees in a wide-range of settings makes an atypical polynomial number of validity queries. In this work, we take a first step towards characterizing regimes where guaranteeing validity is easier than in the worst-case. We show that when the data distribution lies in the model class and the log-loss is minimized, the number of samples required to ensure validity has a weak dependence on the validity requirement. Additionally, we show that when the validity region belongs to a VC-class, a limited number of validity queries are often sufficient.
Researcher Affiliation Academia Nick Rittler University of California San Diego nrittler@ucsd.edu Kamalika Chaudhuri University of California San Diego kamalika@cs.ucsd.edu
Pseudocode Yes Algorithm 1 Modifying ERM to Yield Log-Loss Guarantees, Algorithm 2 Post-Hoc Restriction of ERM to an Estimate of Valid Outputs, Algorithm 3 Restriction to ERM under Log-Loss without Validity Assumption
Open Source Code No The paper is theoretical and does not mention making any source code available. The NeurIPS checklist in the paper confirms this with "NA" for open access to data and code.
Open Datasets No The paper is theoretical and does not refer to any specific publicly available datasets for experimental training. The NeurIPS checklist in the paper states "NA" for experimental reproducibility.
Dataset Splits No The paper is theoretical and does not describe any dataset splits for validation. The NeurIPS checklist in the paper states "NA" for experimental reproducibility.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments. The NeurIPS checklist in the paper states "NA" for experiments compute resources.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies with version numbers. The NeurIPS checklist in the paper states "NA" for experiments.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details such as hyperparameters or training configurations. The NeurIPS checklist in the paper states "NA" for experiments.