Online Variance Reduction with Mixtures

Authors: Zalán Borsos, Sebastian Curi, Kfir Yehuda Levy, Andreas Krause

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate our method experimentally. The experiments are designed to illustrate the underlying principles of the algorithm as well as the beneficial effects of variance reduction in various real-world domains.
Researcher Affiliation Academia 1Department of Computer Science, ETH Zurich. Correspondence to: Zalán Borsos <zalan.borsos@inf.ethz.ch>.
Pseudocode Yes Algorithm 1 ONS, Algorithm 2 VRM, Algorithm 3 Projection
Open Source Code Yes 1The code is available at https://github.com/zalanborsos/variance-reduction-mixtures
Open Datasets Yes We solve linear regression on a synthetic dataset of size n = 1 000 and dimension d = 10 generated as follows: the features are drawn from a multivariate normal distribution with random means and variances for each dimension. ... on the Cartpole environment of the Gym (Brockman et al., 2016). ... We train the algorithms on 80% of the data. For the mixture sampler, we perform an additional 80%-20% split the training data, in order to choose the hyperparameters β and γ. We report the loss on the test sets of the datasets presented in Table 1 (KDD Cup 2004; Faulkner et al., 2011; Le Cun et al., 1998)
Dataset Splits Yes We train the algorithms on 80% of the data. For the mixture sampler, we perform an additional 80%-20% split the training data, in order to choose the hyperparameters β and γ.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions software like "Gym", but does not specify version numbers for any software dependencies, which is required for reproducibility.
Experiment Setup Yes We run 5 epochs of online gradient descent for SVM with step size 0.01/sqrt(t) at iteration t. ... The optimization is performed with minibatch SGD with step size 10^-4/sqrt(t) in round t over 100 epochs and batch size of 5. ... ϵ = {0.01, 0.1, 1} and α = {0.1, 0.5, 0.9}. ... We use batch size b = 100 and number of clusters k = 100, and initialize the centers via k-means++ (Arthur & Vassilvitskii, 2007).