Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic DCA with Variance Reduction and Applications in Machine Learning

Authors: Hoai An Le Thi, Hoang Phuc Hau Luu, Hoai Minh Le, Tao Pham Dinh

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To study the efficiency of our algorithms, we apply them to three important problems in machine learning: nonnegative principal component analysis, group variable selection in multiclass logistic regression, and sparse linear regression. Numerical experiments have shown the merits of our proposed algorithms in comparison with other state-of-the-art stochastic methods for solving nonconvex large-sum problems.
Researcher Affiliation Academia Hoai An Le Thi EMAIL Université de Lorraine, LGIPM, Département IA, F-57000 Metz, France Institut Universitaire de France (IUF) ... Tao Pham Dinh EMAIL Laboratory of Mathematics, INSA-Rouen, University of Normandie 76801 Saint Etienne-du-Rouvray Cedex, France
Pseudocode Yes Algorithm 1 DCA-SVRG Initialization: x0 dom r1, inner-loop length M, minibatch size b, k = 0, option (either with replacement or without replacement). repeat ... Algorithm 2 DCA-SAGA ... Algorithm 3 DCA-SVRG applied to (Q')
Open Source Code No The paper does not contain any explicit statement about the release of source code or provide a link to a code repository. It refers to third-party tools or algorithms but not its own implementation code.
Open Datasets Yes We use standard machine learning data sets in LIBSVM 1, namely, a9a (32561 123), aloi (108000 128), cifar10 (50000 3072), Sens IT Vehicle (78823 100), connect-4 (67557 126), letter (15000 16), mnist (60000 780), protein (17766 357), shuttle (43500 9), Year Prediction MSD (463715 90). 1. The data sets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Dataset Splits No The paper mentions using 'training set' and normalizations for datasets, but does not provide explicit training/test/validation dataset splits (e.g., percentages, sample counts, or references to standard splits for the listed datasets).
Hardware Specification Yes All numerical experiments in this section are performed on a Processor Intel(R) core(TM) i7-8700, CPU @ 3.20GHz, RAM 16 GB.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with specific versions).
Experiment Setup Yes The minibatch size b is chosen as N^(2/3), N^(2/3), 2 * N^(5/4), 2 * sqrt(N + 1) for DCA-SVRG-v1, DCA-SVRG-v2, DCA-SAGA-v1, DCA-SAGA-v2, respectively. We set the inner loop length M for DCA-SVRG-v1 and DCA-SVRG-v2 to be (1/4) * e * (1/b). The fixed budget of SFO calls to be 15N. For the prox-SGD, η = 1/(2L) (Ghadimi et al., 2016) and we choose a neutral minibatch size of 500.