Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Stochastic DCA with Variance Reduction and Applications in Machine Learning

Authors: Hoai An Le Thi, Hoang Phuc Hau Luu, Hoai Minh Le, Tao Pham Dinh

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To study the efficiency of our algorithms, we apply them to three important problems in machine learning: nonnegative principal component analysis, group variable selection in multiclass logistic regression, and sparse linear regression. Numerical experiments have shown the merits of our proposed algorithms in comparison with other state-of-the-art stochastic methods for solving nonconvex large-sum problems.
Researcher Affiliation	Academia	Hoai An Le Thi EMAIL Université de Lorraine, LGIPM, Département IA, F-57000 Metz, France Institut Universitaire de France (IUF) ... Tao Pham Dinh EMAIL Laboratory of Mathematics, INSA-Rouen, University of Normandie 76801 Saint Etienne-du-Rouvray Cedex, France
Pseudocode	Yes	Algorithm 1 DCA-SVRG Initialization: x0 dom r1, inner-loop length M, minibatch size b, k = 0, option (either with replacement or without replacement). repeat ... Algorithm 2 DCA-SAGA ... Algorithm 3 DCA-SVRG applied to (Q')
Open Source Code	No	The paper does not contain any explicit statement about the release of source code or provide a link to a code repository. It refers to third-party tools or algorithms but not its own implementation code.
Open Datasets	Yes	We use standard machine learning data sets in LIBSVM 1, namely, a9a (32561 123), aloi (108000 128), cifar10 (50000 3072), Sens IT Vehicle (78823 100), connect-4 (67557 126), letter (15000 16), mnist (60000 780), protein (17766 357), shuttle (43500 9), Year Prediction MSD (463715 90). 1. The data sets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Dataset Splits	No	The paper mentions using 'training set' and normalizations for datasets, but does not provide explicit training/test/validation dataset splits (e.g., percentages, sample counts, or references to standard splits for the listed datasets).
Hardware Specification	Yes	All numerical experiments in this section are performed on a Processor Intel(R) core(TM) i7-8700, CPU @ 3.20GHz, RAM 16 GB.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with specific versions).
Experiment Setup	Yes	The minibatch size b is chosen as N^(2/3), N^(2/3), 2 * N^(5/4), 2 * sqrt(N + 1) for DCA-SVRG-v1, DCA-SVRG-v2, DCA-SAGA-v1, DCA-SAGA-v2, respectively. We set the inner loop length M for DCA-SVRG-v1 and DCA-SVRG-v2 to be (1/4) * e * (1/b). The fixed budget of SFO calls to be 15N. For the prox-SGD, η = 1/(2L) (Ghadimi et al., 2016) and we choose a neutral minibatch size of 500.