Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Maximum Separation Subspace in Sufficient Dimension Reduction with Categorical Response

Authors: Xin Zhang, Qing Mai, Hui Zou

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical studies show MASES exhibits superior performance as compared with competing SDR methods in specific settings. ... In Sections 5 and 6, we present extensive simulation results and a real data illustration... Finally, all technical proofs are relegated to the Appendix.
Researcher Affiliation Academia Xin Zhang EMAIL Qing Mai EMAIL Department of Statistics Florida State University Tallahassee, FL, 32306, USA Hui Zou EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455, USA
Pseudocode No The paper describes the estimation procedure and derivatives in Section 4.1 and 4.2, but does not present a structured pseudocode or algorithm block.
Open Source Code No Our current implementation adopts the sg_min Matlab package for Stiefel and Grassmann manifolds optimization (Edelman et al., 1998), which preserves the orthogonality constraint BT B = Id. Other numerical methods for optimization with orthogonality constraints (e.g. Wen and Yin, 2013) can also be straightforwardly incorporated into our implementation. ... We would like to thank Dr. Andreas Artemiou from the Cardiff University for sending us the R code for the linear PSVM method
Open Datasets Yes We revisit a discriminant analysis data set from Cook and Forzani (2009), where the goal is to distinguish n1 = 58 birds, n2 = 64 planes and n3 = 43 cars based on 13 continuous SDMFCC variables
Dataset Splits No For each simulation setting, a random vector β and its orthogonal completion B0 Rp (p 1) are randomly simulated such that (β, B0) is an orthogonal basis for Rp. We compared MASES with competitors using the angle between the estimated direction bβ and the truth β. Since β is just a vector, we also compared MASES with the direction estimated by logistic regression, in addition to the SDR methods. We considered various data generating process from the following inverse models. We generated i.i.d. samples of X | (Y = j) with sample size nj = 100 for each class j = 1, 2. ... For the above models, we set the total sample size to be n = n1 + n2 = 200 for models with binary response and n = n1 + n2 + n3 = 300 for models with three classes.
Hardware Specification No No specific hardware details (like CPU/GPU models, memory, or processor types) are mentioned in the paper for running the experiments.
Software Dependencies No Our current implementation adopts the sg_min Matlab package for Stiefel and Grassmann manifolds optimization (Edelman et al., 1998)... We would like to thank Dr. Andreas Artemiou from the Cardiff University for sending us the R code for the linear PSVM method
Experiment Setup Yes Our choice of hn is motivated by the optimal bandwidth of Gaussian basis functions, hn = 1.06 bσ n 1/5, where we use bσ = 1 as the sample standard deviation if in practice we standardize the predictor X initially. ... In our experience, a properly chosen constant δn has little effect to the estimation of MASES. Therefore, in all our numerical studies, we set δn = 0 for simplicity. ... When d = 1, we randomly generate 100 directions B1, . . . , B100 Rp and select the one with the smallest F(B) as the initial estimator; when d > 1, we use the following sequential algorithm to obtain an initial estimator b Bd Rp d and use it in the full Grassmannian optimization of F(B).