Active Labeling: Streaming Stochastic Gradients
Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a proof of concept, we provide numerical simulations in Section 6. We conclude with a high-level discussion around our methods in Section 7. In this section, we illustrate the differences between our active method versus a classical passive method, for regression and classification problems. |
| Researcher Affiliation | Collaboration | Vivien Cabannes Meta Francis Bach INRIA / ENS / PSL Vianney Perchet ENSAE Alessandro Rudi INRIA / ENS / PSL |
| Pseudocode | Yes | Algorithm 1: Median regression with SGD. Data: A model ππfor π Ξ, some data (ππ)π π, a labeling budget π, a step size rule πΎ: N R+ Result: A learned parameter Λπand the predictive function Λπ= πΛπ. Initialize π0. for π‘ 1 to πdo Sample ππ‘uniformly on Sπ 1. Query π= sign( ππ‘ π§,ππ‘ ) for π§= πππ‘ 1(ππ‘). Update the parameter ππ‘= ππ‘ 1 + πΎ(π‘)π π π‘(π·πππ‘ 1(ππ‘)). Output Λπ= ππ, or some average, e.g., Λπ= π 1 Γπ π‘=1 ππ‘. |
| Open Source Code | Yes | Our code is available online at https://github.com/Vivien Cabannes/active-labeling. |
| Open Datasets | No | Let us begin with the regression problem that consists in estimating the function π that maps π₯ [0, 1] to sin(2ππ₯) R... On Figure 1, we focus on estimating π given data (ππ)π [π] that are uniform on [0, 1] in the noiseless setting where ππ= π (ππ), based on the minimization of the absolute deviation loss... To illustrate the versatility of our method, we approach a classification problem through the median surrogate technique presented in Proposition 3. To do so, we consider the classification problem with π N classes, X = [0, 1] and the conditional distribution (π| π) linearly interpolating between Dirac in π¦1, π¦2 and π¦3 respectively for π₯= 0, π₯= 1/2 and π₯= 1 and the uniform distribution for π₯= 1/4 and π₯= 3/4; and πuniform on X \ ([1/4 π, 1/4 + π] [3/4 π, 3/4 + π]). |
| Dataset Splits | No | In practice, one might not know a priori the parameter πbut could nonetheless find the right scaling for πΎbased on cross-validation. The left plot on Figure 1 corresponds to an instance of SGD on such an objective based on the data (ππ, ππ), while the right plot corresponds to Algorithm 1. |
| Hardware Specification | No | The experiments were run on a personal laptop and did not require many charges. |
| Software Dependencies | No | Numpy and LIBSVM are under Berkeley Software Distribution licenses (respectively the liberal and revised ones), Python and matplotlib are under the Python Software Foundation license. |
| Experiment Setup | Yes | We take the same hyperparameters for both plots, a bandwidth π= 0.2 and an SGD step size πΎ= 0.3. In order to consider the streaming setting where πis not known in advance, we consider the decreasing step size πΎ(π‘) = πΎ0/ π‘; and to smooth out the stochasticity due to random gradients, we consider the average estimate ππ‘= (π1 + + ππ‘)/π‘. The left figure corresponds to the noiseless regression setting of Figure 1, with πΎ0 = 1. |