Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Non-splitting Neyman-Pearson Classifiers
Authors: Jingming Wang, Lucy Xia, Zhigang Bao, Xin Tong
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments have confirmed the advantages of our new non-splitting parametric strategy. Numerical Analysis, Simulation Studies, Real Data Analysis sections are included. |
| Researcher Affiliation | Academia | Jingming Wang EMAIL Department of Statistics University of Virginia, Lucy Xia EMAIL Department of ISOM School of Business and Management Hong Kong University of Science and Technology, Zhigang Bao EMAIL Department of Mathematics The University of Hong Kong, Xin Tong EMAIL Department of Data Sciences and Operations Marshall Business School University of Southern California |
| Pseudocode | No | The paper describes methods in text and mathematical formulas, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper discusses licensing for the publication itself but does not provide any statement or link regarding the open-sourcing of the code for the methodology described. |
| Open Datasets | Yes | Fashion MNIST is a widely-used imaging dataset for benchmarking machine learning algorithms. It contains 60,000 training data and 10,000 testing data from ten different fashion categories... The first dataset is a lung cancer dataset (Gordon et al., 2002; Jin and Wang, 2016) that consists of gene expression measurements from 181 tissue samples. The second dataset was originally studied in (Su et al., 2001). It contains microarray data from 11 different tumor types... We consider the popular network intrusion classification problem and apply the NP classifiers to the CSE-CIC-IDS2018 dataset (Sharafaldin et al., 2018). |
| Dataset Splits | Yes | In each replication, we randomly split the full dataset (class 0 and class 1 separately) into a training set (composed of 70% of the data), and a test set (composed of 30% of the data). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper mentions implementing NP umbrella algorithms using 'the R package npc with default parameters,' but it does not specify version numbers for R or the npc package. |
| Experiment Setup | Yes | For all five splitting NP classifiers, τ, the class 0 split proportion, is fixed at 0.5, and the each experiment is repeated 1,000 times. We set the type I error upper bound α = 0.05 and the type I error violation rate target δ = 0.1. ... For NP-svm, npc adopted the radial kernel for analysis. ... In the first scenario, we randomly selected 10% of the dataset as training data... In the second scenario, we randomly selected 5% of the dataset as training data... |