reproducibilityindex.ai

Robust High-Dimensional Classification From Few Positive Examples

Authors: Deepayan Chakrabarti, Benjamin Fauber

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate DIRECT on several real-world datasets spanning document, image, and medical classiﬁcation. DIRECT is up to 5x 7x better than SMOTE-like methods, 30 200% better than ensemble methods, 3x 7x better than cost-sensitive methods. The greatest gains are for settings with the fewest samples in the minority class, where DIRECT s robustness is most helpful.
Researcher Affiliation	Collaboration	Deepayan Chakrabarti1 , Benjamin Fauber2 1University of Texas, Austin 2Dell Inc.
Pseudocode	Yes	Algorithm 1 DIRECT
Open Source Code	Yes	Our code is available at https://github.com/deepayan12/direct.
Open Datasets	Yes	We ran experiments on six text, two image, and one medical dataset, along with 20 UCI datasets (Table 1). Table 1 lists specific datasets like '20-Newsgroups', 'Reuters', 'MNIST (digits)', 'UCI (20 datasets)'. The Tumors dataset refers to '[Yeang et al., 2001]'.
Dataset Splits	No	In each experiment, we created a training set with nlo positive and nhi negative samples that were randomly chosen from the dataset. All remaining datapoints were used for testing. The paper states DIRECT 'does not need cross-validation' and does not describe using a separate validation set for its own experiments.
Hardware Specification	No	The paper mentions evaluating 'Wall-Clock Time' but provides no specific details about the hardware (e.g., CPU, GPU, memory, specific models) used for running the experiments.
Software Dependencies	No	The paper mentions using a 'linear SVM', 'XGBoost', and 'any off-the-shelf solver', but does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	Our proposed classiﬁer... uses a robust kernel density to model the minority class distribution. With an appropriate choice of loss function... ℓ(y, x; θ = (c, w)) = max(0, 1 - y (c + w T x))... and a post-processing step, we adjust the intercept... min c R ... 1c +w T xi. In each experiment, we created a training set with nlo positive and nhi negative samples that were randomly chosen from the dataset. All remaining datapoints were used for testing. We ran experiments on 509 unique (dataset, class, nlo, nhi) combinations, each being repeated 30 times.