Active Labeling: Streaming Stochastic Gradients

Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a proof of concept, we provide numerical simulations in Section 6. We conclude with a high-level discussion around our methods in Section 7. In this section, we illustrate the differences between our active method versus a classical passive method, for regression and classification problems.
Researcher Affiliation Collaboration Vivien Cabannes Meta Francis Bach INRIA / ENS / PSL Vianney Perchet ENSAE Alessandro Rudi INRIA / ENS / PSL
Pseudocode Yes Algorithm 1: Median regression with SGD. Data: A model π‘“πœƒfor πœƒ Θ, some data (𝑋𝑖)𝑖 𝑛, a labeling budget 𝑇, a step size rule 𝛾: N R+ Result: A learned parameter Λ†πœƒand the predictive function ˆ𝑓= π‘“Λ†πœƒ. Initialize πœƒ0. for 𝑑 1 to 𝑇do Sample π‘ˆπ‘‘uniformly on Sπ‘š 1. Query πœ€= sign( π‘Œπ‘‘ 𝑧,π‘ˆπ‘‘ ) for 𝑧= π‘“πœƒπ‘‘ 1(𝑋𝑑). Update the parameter πœƒπ‘‘= πœƒπ‘‘ 1 + 𝛾(𝑑)πœ€ π‘ˆ 𝑑(π·π‘“πœƒπ‘‘ 1(𝑋𝑑)). Output Λ†πœƒ= πœƒπ‘‡, or some average, e.g., Λ†πœƒ= 𝑇 1 Í𝑇 𝑑=1 πœƒπ‘‘.
Open Source Code Yes Our code is available online at https://github.com/Vivien Cabannes/active-labeling.
Open Datasets No Let us begin with the regression problem that consists in estimating the function 𝑓 that maps π‘₯ [0, 1] to sin(2πœ‹π‘₯) R... On Figure 1, we focus on estimating 𝑓 given data (𝑋𝑖)𝑖 [𝑇] that are uniform on [0, 1] in the noiseless setting where π‘Œπ‘–= 𝑓 (𝑋𝑖), based on the minimization of the absolute deviation loss... To illustrate the versatility of our method, we approach a classification problem through the median surrogate technique presented in Proposition 3. To do so, we consider the classification problem with π‘š N classes, X = [0, 1] and the conditional distribution (π‘Œ| 𝑋) linearly interpolating between Dirac in 𝑦1, 𝑦2 and 𝑦3 respectively for π‘₯= 0, π‘₯= 1/2 and π‘₯= 1 and the uniform distribution for π‘₯= 1/4 and π‘₯= 3/4; and 𝑋uniform on X \ ([1/4 πœ€, 1/4 + πœ€] [3/4 πœ€, 3/4 + πœ€]).
Dataset Splits No In practice, one might not know a priori the parameter 𝑀but could nonetheless find the right scaling for 𝛾based on cross-validation. The left plot on Figure 1 corresponds to an instance of SGD on such an objective based on the data (𝑋𝑖, 𝑆𝑖), while the right plot corresponds to Algorithm 1.
Hardware Specification No The experiments were run on a personal laptop and did not require many charges.
Software Dependencies No Numpy and LIBSVM are under Berkeley Software Distribution licenses (respectively the liberal and revised ones), Python and matplotlib are under the Python Software Foundation license.
Experiment Setup Yes We take the same hyperparameters for both plots, a bandwidth 𝜎= 0.2 and an SGD step size 𝛾= 0.3. In order to consider the streaming setting where 𝑇is not known in advance, we consider the decreasing step size 𝛾(𝑑) = 𝛾0/ 𝑑; and to smooth out the stochasticity due to random gradients, we consider the average estimate πœƒπ‘‘= (πœƒ1 + + πœƒπ‘‘)/𝑑. The left figure corresponds to the noiseless regression setting of Figure 1, with 𝛾0 = 1.