Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stochastic DCA with Variance Reduction and Applications in Machine Learning
Authors: Hoai An Le Thi, Hoang Phuc Hau Luu, Hoai Minh Le, Tao Pham Dinh
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To study the efficiency of our algorithms, we apply them to three important problems in machine learning: nonnegative principal component analysis, group variable selection in multiclass logistic regression, and sparse linear regression. Numerical experiments have shown the merits of our proposed algorithms in comparison with other state-of-the-art stochastic methods for solving nonconvex large-sum problems. |
| Researcher Affiliation | Academia | Hoai An Le Thi EMAIL Université de Lorraine, LGIPM, Département IA, F-57000 Metz, France Institut Universitaire de France (IUF) ... Tao Pham Dinh EMAIL Laboratory of Mathematics, INSA-Rouen, University of Normandie 76801 Saint Etienne-du-Rouvray Cedex, France |
| Pseudocode | Yes | Algorithm 1 DCA-SVRG Initialization: x0 dom r1, inner-loop length M, minibatch size b, k = 0, option (either with replacement or without replacement). repeat ... Algorithm 2 DCA-SAGA ... Algorithm 3 DCA-SVRG applied to (Q') |
| Open Source Code | No | The paper does not contain any explicit statement about the release of source code or provide a link to a code repository. It refers to third-party tools or algorithms but not its own implementation code. |
| Open Datasets | Yes | We use standard machine learning data sets in LIBSVM 1, namely, a9a (32561 123), aloi (108000 128), cifar10 (50000 3072), Sens IT Vehicle (78823 100), connect-4 (67557 126), letter (15000 16), mnist (60000 780), protein (17766 357), shuttle (43500 9), Year Prediction MSD (463715 90). 1. The data sets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvm/. |
| Dataset Splits | No | The paper mentions using 'training set' and normalizations for datasets, but does not provide explicit training/test/validation dataset splits (e.g., percentages, sample counts, or references to standard splits for the listed datasets). |
| Hardware Specification | Yes | All numerical experiments in this section are performed on a Processor Intel(R) core(TM) i7-8700, CPU @ 3.20GHz, RAM 16 GB. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with specific versions). |
| Experiment Setup | Yes | The minibatch size b is chosen as N^(2/3), N^(2/3), 2 * N^(5/4), 2 * sqrt(N + 1) for DCA-SVRG-v1, DCA-SVRG-v2, DCA-SAGA-v1, DCA-SAGA-v2, respectively. We set the inner loop length M for DCA-SVRG-v1 and DCA-SVRG-v2 to be (1/4) * e * (1/b). The fixed budget of SFO calls to be 15N. For the prox-SGD, η = 1/(2L) (Ghadimi et al., 2016) and we choose a neutral minibatch size of 500. |