reproducibilityindex.ai

Distribution Learning with Valid Outputs Beyond the Worst-Case

Authors: Nicholas Rittler, Kamalika Chaudhuri

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Prior work on this problem takes a worst-case stance, showing that proper learning requires an exponential number of validity queries, and demonstrating an improper algorithm which while generating guarantees in a wide-range of settings makes an atypical polynomial number of validity queries. In this work, we take a ﬁrst step towards characterizing regimes where guaranteeing validity is easier than in the worst-case. We show that when the data distribution lies in the model class and the log-loss is minimized, the number of samples required to ensure validity has a weak dependence on the validity requirement. Additionally, we show that when the validity region belongs to a VC-class, a limited number of validity queries are often sufﬁcient.
Researcher Affiliation	Academia	Nick Rittler University of California San Diego nrittler@ucsd.edu Kamalika Chaudhuri University of California San Diego kamalika@cs.ucsd.edu
Pseudocode	Yes	Algorithm 1 Modifying ERM to Yield Log-Loss Guarantees, Algorithm 2 Post-Hoc Restriction of ERM to an Estimate of Valid Outputs, Algorithm 3 Restriction to ERM under Log-Loss without Validity Assumption
Open Source Code	No	The paper is theoretical and does not mention making any source code available. The NeurIPS checklist in the paper confirms this with "NA" for open access to data and code.
Open Datasets	No	The paper is theoretical and does not refer to any specific publicly available datasets for experimental training. The NeurIPS checklist in the paper states "NA" for experimental reproducibility.
Dataset Splits	No	The paper is theoretical and does not describe any dataset splits for validation. The NeurIPS checklist in the paper states "NA" for experimental reproducibility.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments. The NeurIPS checklist in the paper states "NA" for experiments compute resources.
Software Dependencies	No	The paper is theoretical and does not describe any specific software dependencies with version numbers. The NeurIPS checklist in the paper states "NA" for experiments.
Experiment Setup	No	The paper is theoretical and does not describe any experimental setup details such as hyperparameters or training configurations. The NeurIPS checklist in the paper states "NA" for experiments.