Online Variance Reduction with Mixtures
Authors: Zalán Borsos, Sebastian Curi, Kfir Yehuda Levy, Andreas Krause
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate our method experimentally. The experiments are designed to illustrate the underlying principles of the algorithm as well as the beneficial effects of variance reduction in various real-world domains. |
| Researcher Affiliation | Academia | 1Department of Computer Science, ETH Zurich. Correspondence to: Zalán Borsos <zalan.borsos@inf.ethz.ch>. |
| Pseudocode | Yes | Algorithm 1 ONS, Algorithm 2 VRM, Algorithm 3 Projection |
| Open Source Code | Yes | 1The code is available at https://github.com/zalanborsos/variance-reduction-mixtures |
| Open Datasets | Yes | We solve linear regression on a synthetic dataset of size n = 1 000 and dimension d = 10 generated as follows: the features are drawn from a multivariate normal distribution with random means and variances for each dimension. ... on the Cartpole environment of the Gym (Brockman et al., 2016). ... We train the algorithms on 80% of the data. For the mixture sampler, we perform an additional 80%-20% split the training data, in order to choose the hyperparameters β and γ. We report the loss on the test sets of the datasets presented in Table 1 (KDD Cup 2004; Faulkner et al., 2011; Le Cun et al., 1998) |
| Dataset Splits | Yes | We train the algorithms on 80% of the data. For the mixture sampler, we perform an additional 80%-20% split the training data, in order to choose the hyperparameters β and γ. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software like "Gym", but does not specify version numbers for any software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We run 5 epochs of online gradient descent for SVM with step size 0.01/sqrt(t) at iteration t. ... The optimization is performed with minibatch SGD with step size 10^-4/sqrt(t) in round t over 100 epochs and batch size of 5. ... ϵ = {0.01, 0.1, 1} and α = {0.1, 0.5, 0.9}. ... We use batch size b = 100 and number of clusters k = 100, and initialize the centers via k-means++ (Arthur & Vassilvitskii, 2007). |