Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Pivotal Estimation of Linear Discriminant Analysis in High Dimensions

Authors: Ethan X. Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu, Tuo Zhao

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical results are backed up by thorough numerical studies using both simulated and real datasets. In comparison with the existing methods, we observe that our proposed PANDA yields equal or better performance, and requires substantially less effort in parameter tuning.
Researcher Affiliation	Academia	Ethan X. Fang EMAIL, Yajun Mei EMAIL, Yuyang Shi EMAIL, Qunzhi Xu EMAIL, Tuo Zhao EMAIL. Department of Biostatistics and Bioinformatics, Duke University. School of Industrial and Systems Engineering, Georgia Tech.
Pseudocode	Yes	Algorithm 1: ADMM with proximal method for solving problem (6)
Open Source Code	No	The paper describes Algorithm 1 for solving the optimization problem but does not provide a specific link to an open-source code repository or an explicit statement that the code for their implementation of PANDA is being released.
Open Datasets	Yes	Our theoretical results are backed up by thorough numerical studies using both simulated and real datasets. We investigate the performance of the PANDA, LPD, and Ada LDA methods on a Leukemia dataset from high-density oligonucleotide microarrays. This dataset was first analyzed by Golub et al. (1999)
Dataset Splits	Yes	In our experiments, under each setting, we randomly sample a validation dataset with n = 200 data points from each class. Specifically, the training set contains 29 ALL and 15 AML samples, the validation set contains 9 ALL and 5 AML samples, and the testing set contains 9 ALL and 5 AML samples.
Hardware Specification	Yes	Running Time: Table 4 summarizes the running time of our PANDA method and the Ada LDA method under the Varying Diagonal model on a regular computer (Intel Core i5, 2.3GHz).
Software Dependencies	No	For both methods we use Gurobi, a commercial software that provides state-of-the-art solver for linear programming and second order cone programming, to solve the optimization problems.
Experiment Setup	Yes	We follow the settings in Cai and Zhang (2019) to generate Σ and β. Motivated by the choice of λ in (7), we let λ = eλ p log p/n, and we tune the parameter eλ, as equivalent to tuning λ. For a fair comparison, for all the three methods (LPD, Ada LDA, and PANDA) we tune eλ by a grid search over a range from 0.1 to 8.0, with a grid size 0.1. For the parameter c in the PANDA method, we observe that the results are insensitive to the value of c as long as c is not too small, see Table 1 for the result of the misclassification rate with different choices of c under the AR(1) model as an example. Therefore, we set c = 20 for all settings. We compute the average of True Positive and True Negative, together with the Precision and Recall for identifying the non-zero entries in β , after applying a threshold at 0.01 for entries in bβ.