Constants Matter: The Performance Gains of Active Learning
Authors: Stephen O Mussmann, Sanjoy Dasgupta
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants. [...] Finally, we present illustrative synthetic experimental results for our upper bounds, demonstrating the effect of the problem dependent parameters in our setting. [...] In summary, our contributions are threefold: [...] Synthetic experiments for the logistic regression uniform sphere setting to illustrate our two upper bounds. |
| Researcher Affiliation | Academia | 1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA 2Computer Science & Engineering, University of California, San Diego, USA. |
| Pseudocode | Yes | For adaptive sampling, we begin with random sampling for the first n/2 samples. Then, an estimate of the true weights is calculated by minimizing the logistic loss, or equivalently maximizing the likelihood. We then rescale the estimate to produce w1; this is done to ensure w1 M with high probability. A constraint set W is constructed as the intersection of a origin-centered sphere of radius w1 and a cone around w1. In the next phase, we proceed by iterations of uncertainty sampling gradient updates with an L2 orthogonal projection onto W. [...] This entire process is shown as Algorithm 1. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | In this section, we run experiments in our synthetic setting: well-specified logistic regression with a uniform distribution over a radius r sphere. [...] The paper uses a 'synthetic setting' for its experiments, meaning the data is generated rather than being a publicly available dataset with concrete access information (e.g., specific link, DOI, or repository name). |
| Dataset Splits | No | The paper discusses the number of samples (n) used but does not explicitly provide details on how the synthetic data is split into training, validation, and test sets, either by percentages, absolute counts, or by referencing standard splits. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific computer configurations) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software libraries, frameworks, or programming languages used in its implementation or experiments. |
| Experiment Setup | Yes | We fix r = 1 in all cases since learning behavior only depends on the product Mr. We make one small change to the algorithm and set θmax = π 4 instead of θmax = min(π/4, 1/(3 w1 r)). We found experimentally that the latter would require a larger n. All experiments are run with 100 replicates with error bars as 95% confidence intervals using a Gaussian approximation. |