Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions

Authors: Kai Tan, Pierre C Bellec

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulation studies on synthetic data corroborate these asymptotic results and confirm the validity of proposed p-values for testing the significance of a given feature.
Researcher Affiliation Academia Kai Tan Department of Statistics Rutgers University Piscataway, NJ 08854 kai.tan@rutgers.edu Pierre C. Bellec Department of Statistics Rutgers University Piscataway, NJ 08854 pierre.bellec@rutgers.edu
Pseudocode No The paper does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The source code for generating all of the experimental results in this paper can be found in the supplementary material.
Open Datasets Yes We conduct a real data analysis by applying the proposed test to heart disease data from the UCI Machine Learning Repository (link: http://archive.ics.uci.edu/ml/machine-learningdatabases/heart-disease/processed.cleveland.data).
Dataset Splits No The paper describes the dataset used (heart disease data, 297 instances, 13 features) and how the response variable was transformed (3 classes). It mentions generating a noise variable but does not specify any training, validation, or test splits for the data.
Hardware Specification No The paper mentions the use of the 'Amarel cluster' at Rutgers in the acknowledgments but does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for the experiments.
Software Dependencies No The paper mentions 'sklearn.linear_model.Logistic Regression from Pedregosa et al. [2011]' as a tool used, but it does not specify any software versions for this or any other libraries or programming languages.
Experiment Setup Yes We set p = 1000 and consider different combinations of (n, K). The covariance matrix Σ is specified to be the correlation matrix of an AR(1) model with parameter ρ = 0.5, that is, Σ = (0.5|i−j|)p×p. We generate the regression coefficients A ∈ Rp×K once and for all as follows: sample A0 ∈ Rp×K with first p/4 rows being i.i.d. N(0, IK), and set the remaining rows to 0. We then scale the coefficients by defining A = A0(A0T ΣA0)−1/2 so that A T ΣA = IK. With this construction, the p-th variable is always a null covariate , and we use this null coordinate j = p to demonstrate the effectiveness of our theoretical results presented in Theorem 2.2 and the suggested test for testing H0 as described in (1.10). Using the above settings, we generate the design matrix X ∈ Rn×p from N(0, Σ), and then simulate the labels from a multinomial logistic model as given in (1.8), using the coefficients A ∈ Rp×K. For each simulation setting, we perform 5,000 repetitions.