Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
QuanDA: Quantile-Based Discriminant Analysis for High-Dimensional Imbalanced Classification
Authors: Qian Tang, Yuwen Gu, Boxiang Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide comprehensive theoretical analysis to validate Quan DA in ultra-high dimensional settings. Through extensive simulation studies and high-dimensional benchmark data analysis, we demonstrate that Quan DA overall outperforms existing classification methods for imbalanced data, including cost-sensitive large-margin classifiers, random forests, and SMOTE. |
| Researcher Affiliation | Academia | Qian Tang School of Statistics University of Minnesota Minneapolis, MN, 55455 EMAIL Yuwen Gu Department of Statistics University of Connecticut Storrs, Connecticut, 06269 EMAIL Boxiang Wang Department of Statistics and Actuarial Science University of Iowa Iowa City, IA, 52246 EMAIL |
| Pseudocode | Yes | Algorithm 1 Quantile-based Discriminant Analysis Input: Training data: {xi, yi}n i=1, τ = ˆπ1. for r = 1, 2, . . . , 10 do Generate U [r] Uniform(0, 1). Compute Z[r] = Y + U [r]. for t = τ 0.05, τ 0.04, . . . , τ + 0.04, τ + 0.05 do For each λ1, compute (bα[r](t, λ1), bβ [r](t, λ1)) from min α,β 1 n Pn i=1 ρτ(z[r] i α x i β) + λ1 β 1. Perform five-fold cross-validation to determine the optimal λ 1(t)[r] based on the AUC scores. end for end for for t = τ 0.05, τ 0.04, . . . , τ + 0.04, τ + 0.05 do Compute bα(t) = 1 10 P10 r=1 bα[r](t, λ 1(t)[r]) and bβ(t) = 1 10 P10 r=1 bβ [r](t, λ 1(t)[r]). Calculate the AUC scores based on (bα(t), bβ(t)) to select the best t . end for |
| Open Source Code | Yes | The implementation of Quan DA is available at https://anonymous.4open. science/status/Quan DA-57FE. |
| Open Datasets | Yes | In this section, we demonstrate the performance of Quan DA using seven benchmark high-dimensional data (Mai and Zou, 2015; Sorace and Zhan, 2003; Graham et al., 2010; Alon et al., 1999; Golub et al., 1999; Singh et al., 2002; Tsanas et al., 2013). All the benchmark data are available at the UCI Machine Learning Repository (Kelly et al., 2023). |
| Dataset Splits | Yes | We randomly split the simulation data into a training set of size 200 and a test set of size 200. ... Each data set is partitioned into two parts: 70% is used for training and the remaining 30% for testing. |
| Hardware Specification | Yes | All numerical experiments in this work were carried out on an Intel(R) Xeon(R) Gold 6430 (3.40 GHz) processor. |
| Software Dependencies | No | The paper mentions several R packages (e.g., glmnet, dsda, randomForest, smotefamily) and other solvers (hdqr, fhdqr) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | For each λ1, compute (bα[r](t, λ1), bβ [r](t, λ1)) from min α,β 1 n Pn i=1 ρτ(z[r] i α x i β) + λ1 β 1. Perform five-fold cross-validation to determine the optimal λ 1(t)[r] based on the AUC scores. ... To address imbalanced classification, we perform weighted logistic regression and weighted random forest, where the weights are determined based on the class proportions. Specifically, the weight for the majority class is set to 1/π0 and for the minority class, it is set to 1/π1. For both logistic regression and dsda, we also employ five-fold cross-validation to select the optimal parameter λ1, given that λ2 = 0.01. |