reproducibilityindex.ai

Active Labeling: Streaming Stochastic Gradients

Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As a proof of concept, we provide numerical simulations in Section 6. We conclude with a high-level discussion around our methods in Section 7. In this section, we illustrate the differences between our active method versus a classical passive method, for regression and classification problems.
Researcher Affiliation	Collaboration	Vivien Cabannes Meta Francis Bach INRIA / ENS / PSL Vianney Perchet ENSAE Alessandro Rudi INRIA / ENS / PSL
Pseudocode	Yes	Algorithm 1: Median regression with SGD. Data: A model 𝑓𝜃for 𝜃 Θ, some data (𝑋𝑖)𝑖 𝑛, a labeling budget 𝑇, a step size rule 𝛾: N R+ Result: A learned parameter ˆ𝜃and the predictive function ˆ𝑓= 𝑓ˆ𝜃. Initialize 𝜃0. for 𝑡 1 to 𝑇do Sample 𝑈𝑡uniformly on S𝑚 1. Query 𝜀= sign( 𝑌𝑡 𝑧,𝑈𝑡 ) for 𝑧= 𝑓𝜃𝑡 1(𝑋𝑡). Update the parameter 𝜃𝑡= 𝜃𝑡 1 + 𝛾(𝑡)𝜀 𝑈 𝑡(𝐷𝑓𝜃𝑡 1(𝑋𝑡)). Output ˆ𝜃= 𝜃𝑇, or some average, e.g., ˆ𝜃= 𝑇 1 Í𝑇 𝑡=1 𝜃𝑡.
Open Source Code	Yes	Our code is available online at https://github.com/Vivien Cabannes/active-labeling.
Open Datasets	No	Let us begin with the regression problem that consists in estimating the function 𝑓 that maps 𝑥 [0, 1] to sin(2𝜋𝑥) R... On Figure 1, we focus on estimating 𝑓 given data (𝑋𝑖)𝑖 [𝑇] that are uniform on [0, 1] in the noiseless setting where 𝑌𝑖= 𝑓 (𝑋𝑖), based on the minimization of the absolute deviation loss... To illustrate the versatility of our method, we approach a classification problem through the median surrogate technique presented in Proposition 3. To do so, we consider the classification problem with 𝑚 N classes, X = [0, 1] and the conditional distribution (𝑌\| 𝑋) linearly interpolating between Dirac in 𝑦1, 𝑦2 and 𝑦3 respectively for 𝑥= 0, 𝑥= 1/2 and 𝑥= 1 and the uniform distribution for 𝑥= 1/4 and 𝑥= 3/4; and 𝑋uniform on X \ ([1/4 𝜀, 1/4 + 𝜀] [3/4 𝜀, 3/4 + 𝜀]).
Dataset Splits	No	In practice, one might not know a priori the parameter 𝑀but could nonetheless find the right scaling for 𝛾based on cross-validation. The left plot on Figure 1 corresponds to an instance of SGD on such an objective based on the data (𝑋𝑖, 𝑆𝑖), while the right plot corresponds to Algorithm 1.
Hardware Specification	No	The experiments were run on a personal laptop and did not require many charges.
Software Dependencies	No	Numpy and LIBSVM are under Berkeley Software Distribution licenses (respectively the liberal and revised ones), Python and matplotlib are under the Python Software Foundation license.
Experiment Setup	Yes	We take the same hyperparameters for both plots, a bandwidth 𝜎= 0.2 and an SGD step size 𝛾= 0.3. In order to consider the streaming setting where 𝑇is not known in advance, we consider the decreasing step size 𝛾(𝑡) = 𝛾0/ 𝑡; and to smooth out the stochasticity due to random gradients, we consider the average estimate 𝜃𝑡= (𝜃1 + + 𝜃𝑡)/𝑡. The left figure corresponds to the noiseless regression setting of Figure 1, with 𝛾0 = 1.