Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Authors: Xingye Qiao, Lingsong Zhang

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations and real data applications are investigated to illustrate theoretical findings. Keywords: classification, Fisher consistency, high-dimensional low-sample size asymptotics, imbalanced data, support vector machine. Section 6 demonstrates its properties using simulation experiments. A real application study is conducted in Section 7.
Researcher Affiliation Academia Xingye Qiao EMAIL Department of Mathematical Sciences Binghamton University State University of New York Binghamton, NY 13902-6000, USA. Lingsong Zhang EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA.
Pseudocode Yes Algorithm 1 (Adaptive parameter) 1. Initiate θ0 = 0. 2. For k = 0, 1, , (a) Solve FLAME solutions ω(θk) and β(θk) given parameter θk. (b) Let θk+1 = max θk, n g(n+)(θk) C o 1 , where gj(θk) is the functional margin uj yj(x T j ω(θk) + β(θk)) of the jth vector in the negative/majority class and g(l)(θk) is the lth order statistic of these functional margins. 3. When θk = θk 1, the iteration stops.
Open Source Code Yes A MATLAB routine has been implemented and is available at the authors personal websites. See Online Appendix 1 for more details on the implementation.
Open Datasets Yes In this section we demonstrate the performance of FLAME on a real example: the Human Lung Carcinomas Microarray Data set, which has been analyzed earlier in Bhattacharjee et al. (2001).
Dataset Splits Yes We conduct five-fold cross-validations (CV) to evaluate the within-group error for the two classes over 100 random splits. In each split, we apply FLAME with 21 different θ values, ranging from 0, 0.05, 0.1, . . . to 1.
Hardware Specification No The paper describes simulation experiments and a real data application but does not specify any particular hardware used for running the experiments.
Software Dependencies No A MATLAB routine has been implemented and is available at the authors personal websites. See Online Appendix 1 for more details on the implementation. No specific version of MATLAB or any other software dependencies are mentioned.
Experiment Setup Yes In this simulation setting, data are from multivariate normal distributions with identity covariance matrix MV Nd(µ , Id), where d = 100, 400, 700 and 1000. We let µ0 = c(d, d 1, d 2, , 1)T where c > 0 is a constant which scales µ0 to have norm 2.7. Then we let µ+ = µ0 and µ = µ0. The imbalance factor varies among 1, 4 and 9 while the total sample size is 240. For each experiment, we repeat the simulation 50 times... We conduct five-fold cross-validations (CV) to evaluate the within-group error for the two classes over 100 random splits. In each split, we apply FLAME with 21 different θ values, ranging from 0, 0.05, 0.1, . . . to 1.