Convergence Rates of Active Learning for Maximum Likelihood Estimation

Authors: Kamalika Chaudhuri, Sham M. Kakade, Praneeth Netrapalli, Sujay Sanghavi

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Previous theoretical work has rigorously characterized label complexity of active learning, but most of this work has focused on the PAC or the agnostic PAC model. In this paper, we shift our attention to a more general setting maximum likelihood estimation. Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem. The conditions we require are fairly general, and cover the widely popular class of Generalized Linear Models, which in turn, include models for binary and multi-class classification, regression, and conditional random fields. We provide an upper bound on the label requirement of our algorithm, and a lower bound that matches it up to lower order terms. Our analysis shows that unlike binary classification in the realizable case, just a single extra round of interaction is sufficient to achieve near-optimal performance in maximum likelihood estimation.
Researcher Affiliation Collaboration Dept. of CS, University of California at San Diego. Email: kamalika@cs.ucsd.edu Dept. of CS and of Statistics, University of Washington. Email: sham@cs.washington.edu Microsoft Research New England. Email:praneeth@microsoft.com Dept. of ECE, The University of Texas at Austin. Email:sanghavi@mail.utexas.edu
Pseudocode Yes Algorithm 1 Active Set Select Input: Samples xi, for i = 1, , n 1: Draw m1 samples u.a.r from U, and query their labels to get S1. 2: Use S1 to solve the MLE problem: 1 = argmin 2 3: Solve the following SDP (refer Lemma 3): a = argmina Tr i ai I(xi, 1) 0 ai 1 P 4: Draw m2 examples using probability = 1 +(1 )U where the distribution 1 = a 2 . Query their labels to get S2. 5: Use S2 to solve the MLE problem: 2 = argmin 2
Open Source Code No The paper does not provide any explicit statements or links indicating that source code for the described methodology is available.
Open Datasets No The paper is theoretical and does not use or describe any specific datasets for training or experimentation, hence no information about public availability.
Dataset Splits No The paper is theoretical and does not describe experimental setups with training/validation/test dataset splits.
Hardware Specification No The paper is theoretical and does not discuss hardware specifications for running experiments.
Software Dependencies No The paper is theoretical and does not discuss specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and focuses on algorithm design and proofs; it does not include details on experimental setup such as hyperparameters or system-level training settings.