Sufficient dimension reduction for classification using principal optimal transport direction

Authors: Cheng Meng, Jun Yu, Jingyi Zhang, Ping Ma, Wenxuan Zhong

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies show POTD outperforms most of the state-of-the-art linear dimension reduction methods. Furthermore, we show the proposed method outperforms several state-of-the-art linear dimension reduction methods in terms of classification accuracy through extensive experiments on various real-world datasets.
Researcher Affiliation Academia 1 Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China 2 School of Mathematics and Statistics, Beijing Institute of Technology 3 Center for Statistical Science, Tsinghua University 4 Department of Statistics, University of Georgia
Pseudocode Yes Algorithm 1 Principal Optimal Transport Direction (POTD) Input: X Rn d, Y {1, . . . , k}, a Rn, the structure dimension r for i in 1 : k do for j in {1, . . . , i 1, i + 1, . . . , k} do Gij OT[(X(i), ai), (X(j), aj), cost = || ||2] ij = diag(ai)X(i) Gij X(j) end for i1 . . . ij Λ(1) . . . Λ(k) Output: v1, . . . , vr, i.e., the first r right singular vectors of Λ
Open Source Code No The paper does not provide an explicit statement about releasing source code for the methodology, nor does it include a link to a code repository.
Open Datasets Yes We consider seven multi-class real-world datasets1: Breast Cancer Wisconsin (WDBC), Letter Recognition (LETTER), Pop failures (POP), QSAR biodegradation (BIODEG), Connectionist Bench Sonar (SONAR), and Optical Recognition of Handwritten Digits (OPTDIGITS). 1All the datasets are downloaded from UCI machine learning repository [6]
Dataset Splits Yes For each dataset, we replicate the experiment one hundred times. In each replication, each dataset is randomly divided into the training set and the testing set of equal sizes.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'R package Rdimtools' and 'dr' but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiments.
Experiment Setup Yes Throughout the simulation, we set n = 400 and p = 10, 20, 30. ... For all other methods, we consider five different choices of structure dimension r, i.e., r = {2, 4, 6, 8, 10}. For each r, the training set is first projected to a r-dimensional subspace. We then apply K-nearest neighbor classifier to the projected training set, and K is set to be 10.