Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pivotal Estimation of Linear Discriminant Analysis in High Dimensions

Authors: Ethan X. Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu, Tuo Zhao

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical results are backed up by thorough numerical studies using both simulated and real datasets. In comparison with the existing methods, we observe that our proposed PANDA yields equal or better performance, and requires substantially less effort in parameter tuning.
Researcher Affiliation Academia Ethan X. Fang EMAIL, Yajun Mei EMAIL, Yuyang Shi EMAIL, Qunzhi Xu EMAIL, Tuo Zhao EMAIL. Department of Biostatistics and Bioinformatics, Duke University. School of Industrial and Systems Engineering, Georgia Tech.
Pseudocode Yes Algorithm 1: ADMM with proximal method for solving problem (6)
Open Source Code No The paper describes Algorithm 1 for solving the optimization problem but does not provide a specific link to an open-source code repository or an explicit statement that the code for their implementation of PANDA is being released.
Open Datasets Yes Our theoretical results are backed up by thorough numerical studies using both simulated and real datasets. We investigate the performance of the PANDA, LPD, and Ada LDA methods on a Leukemia dataset from high-density oligonucleotide microarrays. This dataset was first analyzed by Golub et al. (1999)
Dataset Splits Yes In our experiments, under each setting, we randomly sample a validation dataset with n = 200 data points from each class. Specifically, the training set contains 29 ALL and 15 AML samples, the validation set contains 9 ALL and 5 AML samples, and the testing set contains 9 ALL and 5 AML samples.
Hardware Specification Yes Running Time: Table 4 summarizes the running time of our PANDA method and the Ada LDA method under the Varying Diagonal model on a regular computer (Intel Core i5, 2.3GHz).
Software Dependencies No For both methods we use Gurobi, a commercial software that provides state-of-the-art solver for linear programming and second order cone programming, to solve the optimization problems.
Experiment Setup Yes We follow the settings in Cai and Zhang (2019) to generate Σ and β. Motivated by the choice of λ in (7), we let λ = eλ p log p/n, and we tune the parameter eλ, as equivalent to tuning λ. For a fair comparison, for all the three methods (LPD, Ada LDA, and PANDA) we tune eλ by a grid search over a range from 0.1 to 8.0, with a grid size 0.1. For the parameter c in the PANDA method, we observe that the results are insensitive to the value of c as long as c is not too small, see Table 1 for the result of the misclassification rate with different choices of c under the AR(1) model as an example. Therefore, we set c = 20 for all settings. We compute the average of True Positive and True Negative, together with the Precision and Recall for identifying the non-zero entries in β , after applying a threshold at 0.01 for entries in bβ.