List and Certificate Complexities in Replicable Learning

Authors: Peter Dixon, A. Pavan, Jason Vander Woude, N. V. Vinodchandran

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We investigate replicable learning algorithms. Informally a learning algorithm is replicable if the algorithm outputs the same canonical hypothesis over multiple runs with high probability, even when different runs observe a different set of samples from the unknown data distribution. In general, such a strong notion of replicability is not achievable. Thus we consider two feasible notions of replicability called list replicability and certificate replicability. The goal is to design learning algorithms with optimal list and certificate complexities while minimizing the sample complexity. Our contributions are the following. We first study the learning task of estimating the biases of d coins, up to an additive error of ε, by observing samples. For this task, we design a (d + 1)-list replicable algorithm. To complement this result, we establish that the list complexity is optimal, i.e there are no learning algorithms with a list size smaller than d + 1 for this task. We also design learning algorithms with certificate complexity O(log d). The sample complexity of both these algorithms is O( d2 ε2 ) where ε is the approximation error parameter (for a constant error probability). In the PAC model, we show that any hypothesis class that is learnable with d-nonadaptive statistical queries can be learned via a (d + 1)-list replicable algorithm and also via a O(log d)-certificate replicable algorithm. The sample complexity of both these algorithms is O( d2 ν2 ) where ν is the approximation error of the statistical query. We also show that for the concept class d THRESHOLD, the list complexity is exactly d + 1 with respect to the uniform distribution. To establish our upper bound results we use rounding schemes induced by geometric partitions with certain properties. We use Sperner/KKM Lemma to establish the lower bound results.
Researcher Affiliation Academia Peter Dixon1, A. Pavan 2, Jason Vander Woude3, and N. V. Vinodchandran4 1Independent Researcher tooplark@gmail.com 2Iowa State University pavan@cs.iastate.edu 3University of Nebraska-Lincoln jasonvw@huskers.unl.edu 4University of Nebraska-Lincoln vinod@cse.unl.edu
Pseudocode Yes Algorithm 1 (d + 1)-list replicable algorithm for d-COIN BIAS ESTIMATION PROBLEM
Open Source Code No The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets No The paper is theoretical and focuses on algorithm design and proofs for learning tasks like 'estimating the biases of d coins' and 'PAC learning'. It does not refer to specific real-world datasets used for empirical training or provide access information for any dataset.
Dataset Splits No The paper is theoretical and does not describe empirical experiments involving dataset splits for training, validation, or testing.
Hardware Specification No The paper does not specify any hardware used for computations or experiments. This is consistent with a theoretical research paper.
Software Dependencies No The paper is theoretical and does not mention any specific software dependencies or their version numbers used for implementation or proofs.
Experiment Setup No The paper is theoretical and does not describe empirical experiments, thus no experimental setup details such as hyperparameters or training configurations are provided.