Stochastic Optimization with Arbitrary Recurrent Data Sampling

Authors: William Powell, Hanbaek Lyu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally validate our results for the tasks of non-negative matrix factorization and logistic regression. We find that our method is robust to data heterogeneity as it produces stable iterate trajectories while still maintaining fast convergence (see Sec. 4.2).
Researcher Affiliation Academia 1Department of Mathematics, University of Wisconsin-Madison, WI, USA.
Pseudocode Yes Algorithm 1 Incremental Majorization Minimization with Dynamic Proximal Regularization... Algorithm 2 Incremental Majorization Minimization with Diminishing Radius
Open Source Code No The paper does not provide concrete access to its own source code for the methodology described.
Open Datasets Yes We consider a randomly drawn collection of 5000 images from the MNIST (Deng, 2012) dataset
Dataset Splits No The paper describes how the dataset was structured for the experiments (e.g., divided into groups, batched into nodes) but does not provide explicit training, validation, and test dataset split percentages or counts needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We include here a list of hyperparamters used for the NMF experiments. For Ada Grad we used constant step size parameter η = 0.5. For both RMISO-DPR and RMISO-CPR we set ρ = 2500 for the random walk and ρ = 50 for cyclic sampling. For the diminishing radius version RMISO-DR we set rn = 1 n log(n+1). ... The hyperparameters for the logistic regression experiments were chosen as follows. For MCSAG and RMISO/MISO we took L = 2/5. ... We ran SGD with a decaying step size of the form αn = α nγ where α = 0.1 and γ = 0.5. For SGD-HB and Ada Grad we used step sizes α = 0.05 and SGD-HB momentum parameter β = 0.9.