Convergence Rates of Active Learning for Maximum Likelihood Estimation
Authors: Kamalika Chaudhuri, Sham M. Kakade, Praneeth Netrapalli, Sujay Sanghavi
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Previous theoretical work has rigorously characterized label complexity of active learning, but most of this work has focused on the PAC or the agnostic PAC model. In this paper, we shift our attention to a more general setting maximum likelihood estimation. Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem. The conditions we require are fairly general, and cover the widely popular class of Generalized Linear Models, which in turn, include models for binary and multi-class classification, regression, and conditional random fields. We provide an upper bound on the label requirement of our algorithm, and a lower bound that matches it up to lower order terms. Our analysis shows that unlike binary classification in the realizable case, just a single extra round of interaction is sufficient to achieve near-optimal performance in maximum likelihood estimation. |
| Researcher Affiliation | Collaboration | Dept. of CS, University of California at San Diego. Email: kamalika@cs.ucsd.edu Dept. of CS and of Statistics, University of Washington. Email: sham@cs.washington.edu Microsoft Research New England. Email:praneeth@microsoft.com Dept. of ECE, The University of Texas at Austin. Email:sanghavi@mail.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Active Set Select Input: Samples xi, for i = 1, , n 1: Draw m1 samples u.a.r from U, and query their labels to get S1. 2: Use S1 to solve the MLE problem: 1 = argmin 2 3: Solve the following SDP (refer Lemma 3): a = argmina Tr i ai I(xi, 1) 0 ai 1 P 4: Draw m2 examples using probability = 1 +(1 )U where the distribution 1 = a 2 . Query their labels to get S2. 5: Use S2 to solve the MLE problem: 2 = argmin 2 |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is available. |
| Open Datasets | No | The paper is theoretical and does not use or describe any specific datasets for training or experimentation, hence no information about public availability. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental setups with training/validation/test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not discuss hardware specifications for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not discuss specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and proofs; it does not include details on experimental setup such as hyperparameters or system-level training settings. |