Efficient Continual Finite-Sum Minimization

Authors: Ioannis Mavrothalassitis, Stratis Skoulakis, Leello Tadesse Dadi, Volkan Cevher

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 EXPERIMENTS We experimentally evaluate the methods (SGD,SGD-sparse, Katyusha, SVRG and CSVRG) on a ridge regression task. Given some dataset (ai, bi)n i=1 Rd R, at each stage i [n], we consider the the finite sum objective gi(x) := Pi j=1(a j x bj)2/i + λ x 2 2 with λ = 10 3. We choose the latter setting so as to be able to compute the exact optimal solution at each stage i [n]. For our experiments we use the datasets found in the LIBSVM package Chang and Lin (2011) for which we report our results.
Researcher Affiliation Academia Ioannis Mavrothalassitis , Stratis Skoulakis , Leello Tadesse Dadi, Volkan Cevher LIONS, École Polytechnique Fédérale de Lausanne {ioannis.mavrothalassitis, efstratios.skoulakis, leello.dadi, volkan.cevher}@epfl.ch
Pseudocode Yes Algorithm 1 CSVRG
Open Source Code Yes Reproducibility Statement In the appendix we have included the formal proofs of all the theorems and lemmas provided in the main part of the paper. In order to facilitate the reproducibility of our theoretical results, for each theorem we have created a separate self-contained section presenting its proof. Concerning the experimental evaluations of our work, we provide the code used in our experiments as well as the selected parameters in each of the presented methods.
Open Datasets Yes For our experiments we use the datasets found in the LIBSVM package Chang and Lin (2011) for which we report our results. ... In this section we include additional experimental evaluations CSVRG,SGD and SVRG for the continual finite-sum setting in the context of 2-Layer Neural Networks and the MNIST dataset.
Dataset Splits No No explicit statement about train/validation/test splits, specific percentages, or cross-validation setup. The paper mentions processing data points in 'stages' where new data arrives over time.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) are mentioned for running the experiments. The paper focuses on algorithmic complexity and experimental results on datasets.
Software Dependencies No No specific software dependencies with version numbers are provided. While it mentions the use of 'LIBSVM package' and implies deep learning frameworks for neural networks, it does not specify versions (e.g., 'PyTorch 1.9').
Experiment Setup Yes For our experiments we use the datasets found in the LIBSVM package Chang and Lin (2011) for which we report our results. At each stage i [n], we reveal a new data point (ai, bi). In all of our experiments we run CSVRG with α = 0.3 and Ti = 100. The inner iterations of SGD and SGD-sparse are appropriately selected so that their overall FO calls match the FO calls of CSVRG. At each stage i [n] we run Katyusha Allen-Zhu (2017) and SVRG Johnson and Zhang (2013) on the prefix-sum function gi(x) with 10 outer iterations and 100 inner iterations. In Appendix K.2 we present further experimental evaluations as well as the exact parameters used for each method.