reproducibilityindex.ai

Sufficient dimension reduction for classification using principal optimal transport direction

Authors: Cheng Meng, Jun Yu, Jingyi Zhang, Ping Ma, Wenxuan Zhong

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies show POTD outperforms most of the state-of-the-art linear dimension reduction methods. Furthermore, we show the proposed method outperforms several state-of-the-art linear dimension reduction methods in terms of classiﬁcation accuracy through extensive experiments on various real-world datasets.
Researcher Affiliation	Academia	1 Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China 2 School of Mathematics and Statistics, Beijing Institute of Technology 3 Center for Statistical Science, Tsinghua University 4 Department of Statistics, University of Georgia
Pseudocode	Yes	Algorithm 1 Principal Optimal Transport Direction (POTD) Input: X Rn d, Y {1, . . . , k}, a Rn, the structure dimension r for i in 1 : k do for j in {1, . . . , i 1, i + 1, . . . , k} do Gij OT[(X(i), ai), (X(j), aj), cost = \|\| \|\|2] ij = diag(ai)X(i) Gij X(j) end for i1 . . . ij Λ(1) . . . Λ(k) Output: v1, . . . , vr, i.e., the ﬁrst r right singular vectors of Λ
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology, nor does it include a link to a code repository.
Open Datasets	Yes	We consider seven multi-class real-world datasets1: Breast Cancer Wisconsin (WDBC), Letter Recognition (LETTER), Pop failures (POP), QSAR biodegradation (BIODEG), Connectionist Bench Sonar (SONAR), and Optical Recognition of Handwritten Digits (OPTDIGITS). 1All the datasets are downloaded from UCI machine learning repository [6]
Dataset Splits	Yes	For each dataset, we replicate the experiment one hundred times. In each replication, each dataset is randomly divided into the training set and the testing set of equal sizes.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using 'R package Rdimtools' and 'dr' but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiments.
Experiment Setup	Yes	Throughout the simulation, we set n = 400 and p = 10, 20, 30. ... For all other methods, we consider ﬁve different choices of structure dimension r, i.e., r = {2, 4, 6, 8, 10}. For each r, the training set is ﬁrst projected to a r-dimensional subspace. We then apply K-nearest neighbor classiﬁer to the projected training set, and K is set to be 10.