reproducibilityindex.ai

Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions

Authors: Kai Tan, Pierre C Bellec

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive simulation studies on synthetic data corroborate these asymptotic results and confirm the validity of proposed p-values for testing the significance of a given feature.
Researcher Affiliation	Academia	Kai Tan Department of Statistics Rutgers University Piscataway, NJ 08854 kai.tan@rutgers.edu Pierre C. Bellec Department of Statistics Rutgers University Piscataway, NJ 08854 pierre.bellec@rutgers.edu
Pseudocode	No	The paper does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The source code for generating all of the experimental results in this paper can be found in the supplementary material.
Open Datasets	Yes	We conduct a real data analysis by applying the proposed test to heart disease data from the UCI Machine Learning Repository (link: http://archive.ics.uci.edu/ml/machine-learningdatabases/heart-disease/processed.cleveland.data).
Dataset Splits	No	The paper describes the dataset used (heart disease data, 297 instances, 13 features) and how the response variable was transformed (3 classes). It mentions generating a noise variable but does not specify any training, validation, or test splits for the data.
Hardware Specification	No	The paper mentions the use of the 'Amarel cluster' at Rutgers in the acknowledgments but does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for the experiments.
Software Dependencies	No	The paper mentions 'sklearn.linear_model.Logistic Regression from Pedregosa et al. [2011]' as a tool used, but it does not specify any software versions for this or any other libraries or programming languages.
Experiment Setup	Yes	We set p = 1000 and consider different combinations of (n, K). The covariance matrix Σ is specified to be the correlation matrix of an AR(1) model with parameter ρ = 0.5, that is, Σ = (0.5\|i−j\|)p×p. We generate the regression coefficients A ∈ Rp×K once and for all as follows: sample A0 ∈ Rp×K with first p/4 rows being i.i.d. N(0, IK), and set the remaining rows to 0. We then scale the coefficients by defining A = A0(A0T ΣA0)−1/2 so that A T ΣA = IK. With this construction, the p-th variable is always a null covariate , and we use this null coordinate j = p to demonstrate the effectiveness of our theoretical results presented in Theorem 2.2 and the suggested test for testing H0 as described in (1.10). Using the above settings, we generate the design matrix X ∈ Rn×p from N(0, Σ), and then simulate the labels from a multinomial logistic model as given in (1.8), using the coefficients A ∈ Rp×K. For each simulation setting, we perform 5,000 repetitions.