On the Convergence of Loss and Uncertainty-based Active Learning Algorithms

Authors: Daniel Haimovich, Dima Karamshuk, Fridolin Linder, Niek Tax, Milan Vojnovic

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values. In this section we evaluate our AWS algorithm, defined in Section 3.2. In particular, we focus on an instance of AWS with stochastic Polyak s expected step size for logistic regression and the loss-based sampling proportional to absolute error loss, which we refer to as Adaptive-Weight Sampling Polyak Absloss (AWS-PA). By, Corollary 3.7, AWS-PA converges according to Theorem 3.6. Here we demonstrate convergence on real-world datasets and compare with other algorithms.
Researcher Affiliation Collaboration Daniel Haimovich Meta, Central Applied Science danielha@meta.com Dima Karamshuk Meta, Central Applied Science karamshuk@meta.com Fridolin Linder Meta, Central Applied Science flinder@meta.com Niek Tax Meta, Central Applied Science niek@meta.com Milan Vojnovi c London School of Economics m.vojnovic@lse.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The implementation of AWS-PA algorithm along with all the other code run the experimental setup that is described in this section is available at https://www.github.com/facebookresearch/ Adaptive Weight Sampling.
Open Datasets Yes We use a modified version of the mushroom binary classification dataset [Chang and Lin, 2011]... Furthermore, we include five datasets that we selected at random from the 44 real-world datasets that were used in Yang and Loog [2018], a benchmark study of active learning for logistic regression: MNIST 3 vs 5 Le Cun et al. [1998], parkinsons Little et al. [2007], splice Noordewier et al. [1990], tictactoe Aha [1991], and credit Quinlan [1987].
Dataset Splits No In our evaluation, we deliberately confine the training to a single epoch. Throughout this epoch, we sequentially process each data instance, compute the loss for each individual instance, and subsequently update the model s weights. This approach, known as progressive validation [Blum et al., 1999], enables us to monitor the evolution of the average loss.
Hardware Specification Yes All experiments were performed on a single machine with 72 CPU cores and 228 GB RAM.
Software Dependencies No We used the scikit-learn implementation of the RF regressor and manually tuned two hyperparameters for different datasets: (a) number of tree estimators (b) number of "warm-up" steps during which we sample content with a constant probability until we collect enough samples to train an RF estimator.
Experiment Setup Yes For Polyak absloss and Polyak exponent we set the search space of η to [0.01, 1] and the search space of ρ to [0, 1]. Note that the values of η and ρ influence the rate of sampling. In line with the typical goal of active learning, we aim to learn efficiently and minimize loss under some desired rate of sampling. Therefore, for every configuration of η and ρ we use binary search to find the value of β that achieves some target empirical sampling rate. In the hyperparameter tuning we set the target empirical sampling rate to 50%. In our evaluation, we deliberately confine the training to a single epoch.