Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction
Authors: Radu Alexandru Dragomir, Mathieu Even, Hadrien Hendrikx
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the effectiveness of our approach on two key applications of relative smoothness: tomographic reconstruction with Poisson noise and statistical preconditioning for distributed optimization. ... 5. Experiments In order to show the effectiveness of our method, we consider the two key settings mentioned in the introduction: problems with unbounded curvature (inverse problems with Poisson noise) and preconditioned distributed optimization. |
| Researcher Affiliation | Academia | 1Universit e Toulouse 1 Capitole 2D.I. Ecole Normale Sup erieure, CRNS, PSL University, Paris 3INRIA Paris. |
| Pseudocode | Yes | Algorithm 1 Bregman-SAGA((ηt)t 0, x0) |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | We use the log-barrier reference function, h(x) = P i log xi, for which relative smoothness holds with Lf/h = Pn i=1 bi/n (Bauschke et al., 2017). ... We solve a logistic regression problem for the RCV1 dataset (Lewis et al., 2004). |
| Dataset Splits | No | The paper mentions using specific datasets but does not provide details on training, validation, or test splits, such as percentages, sample counts, or explicit instructions for partitioning the data. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., CPU/GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper discusses algorithms and concepts but does not list any specific software components with version numbers (e.g., Python, PyTorch, specific libraries or solvers) used for implementation or experimentation. |
| Experiment Setup | Yes | A fixed learning rate is used, and the best one is selected selected among [0.025, 0.05, 0.1, 0.25, 0.5, 1.]. BGD uses η = 0.5 while SAGA and BSGD use η = 0.05. The x-axis represents the total number of communications (or number of passes over the dataset). Note that at each epoch, BGD communicates once with all workers (one round trip for each worker) whereas BSGD and BSAGA communicate n times with one worker sampled uniformly at random each time. ... Regularization is taken as λ = 10 5, and there are n = 100 nodes with N = 1000 samples each. |