Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

QuanDA: Quantile-Based Discriminant Analysis for High-Dimensional Imbalanced Classification

Authors: Qian Tang, Yuwen Gu, Boxiang Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide comprehensive theoretical analysis to validate Quan DA in ultra-high dimensional settings. Through extensive simulation studies and high-dimensional benchmark data analysis, we demonstrate that Quan DA overall outperforms existing classification methods for imbalanced data, including cost-sensitive large-margin classifiers, random forests, and SMOTE.
Researcher Affiliation	Academia	Qian Tang School of Statistics University of Minnesota Minneapolis, MN, 55455 EMAIL Yuwen Gu Department of Statistics University of Connecticut Storrs, Connecticut, 06269 EMAIL Boxiang Wang Department of Statistics and Actuarial Science University of Iowa Iowa City, IA, 52246 EMAIL
Pseudocode	Yes	Algorithm 1 Quantile-based Discriminant Analysis Input: Training data: {xi, yi}n i=1, τ = ˆπ1. for r = 1, 2, . . . , 10 do Generate U [r] Uniform(0, 1). Compute Z[r] = Y + U [r]. for t = τ 0.05, τ 0.04, . . . , τ + 0.04, τ + 0.05 do For each λ1, compute (bα[r](t, λ1), bβ [r](t, λ1)) from min α,β 1 n Pn i=1 ρτ(z[r] i α x i β) + λ1 β 1. Perform five-fold cross-validation to determine the optimal λ 1(t)[r] based on the AUC scores. end for end for for t = τ 0.05, τ 0.04, . . . , τ + 0.04, τ + 0.05 do Compute bα(t) = 1 10 P10 r=1 bα[r](t, λ 1(t)[r]) and bβ(t) = 1 10 P10 r=1 bβ [r](t, λ 1(t)[r]). Calculate the AUC scores based on (bα(t), bβ(t)) to select the best t . end for
Open Source Code	Yes	The implementation of Quan DA is available at https://anonymous.4open. science/status/Quan DA-57FE.
Open Datasets	Yes	In this section, we demonstrate the performance of Quan DA using seven benchmark high-dimensional data (Mai and Zou, 2015; Sorace and Zhan, 2003; Graham et al., 2010; Alon et al., 1999; Golub et al., 1999; Singh et al., 2002; Tsanas et al., 2013). All the benchmark data are available at the UCI Machine Learning Repository (Kelly et al., 2023).
Dataset Splits	Yes	We randomly split the simulation data into a training set of size 200 and a test set of size 200. ... Each data set is partitioned into two parts: 70% is used for training and the remaining 30% for testing.
Hardware Specification	Yes	All numerical experiments in this work were carried out on an Intel(R) Xeon(R) Gold 6430 (3.40 GHz) processor.
Software Dependencies	No	The paper mentions several R packages (e.g., glmnet, dsda, randomForest, smotefamily) and other solvers (hdqr, fhdqr) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	For each λ1, compute (bα[r](t, λ1), bβ [r](t, λ1)) from min α,β 1 n Pn i=1 ρτ(z[r] i α x i β) + λ1 β 1. Perform five-fold cross-validation to determine the optimal λ 1(t)[r] based on the AUC scores. ... To address imbalanced classification, we perform weighted logistic regression and weighted random forest, where the weights are determined based on the class proportions. Specifically, the weight for the majority class is set to 1/π0 and for the minority class, it is set to 1/π1. For both logistic regression and dsda, we also employ five-fold cross-validation to select the optimal parameter λ1, given that λ2 = 0.01.