kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection

Authors: Lotfi Slim, Clément Chatelain, Chloe-Agathe Azencott, Jean-Philippe Vert

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first demonstrate the statistical validity of our PSI procedure, which we refer to as kernel PSI. We simulate a design matrix X of n = 100 samples and p = 50 features, partitioned in S = 10 disjoint and mutuallyindependent subgroups of p = 5 features, drawn from a normal distribution centered at 0 and with a covariance matrix Vij = ρ|i j|, i, j {1, p }. We set the correlation parameter ρ to 0.6. To each group corresponds a local Gaussian kernel Ki, of variance σ2 = 5. The outcome Y is drawn as Y = θK1:3U1 + ϵ, where K1:3 = K1 + K2 + K3, U1 is the eigenvector corresponding to the largest eigenvalue of K1:3, and ϵ is Gaussian noise centered at 0. We vary the effect size of θ across the range θ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5}, and resample Y 1 000 times to create 1 000 simulations.
Researcher Affiliation Collaboration 1Translational Sciences, SANOFI R&D, France 2MINES Paris Tech, PSL Research University, CBIO Centre for Computational Biology, F-75006 Paris, France 3Institut Curie, PSL Research University, INSERM, U900, F-75005 Paris, France 4Google Brain, F-75009 Paris, France.
Pseudocode Yes Algorithm 1 Forward stepwise kernel selection
Open Source Code No The paper does not provide a specific link or explicit statement about the release of its source code.
Open Datasets Yes Here we study the flowering time phenotype FT GH of the Arabidopsis thaliana dataset of Atwell et al. (2010).
Dataset Splits No The paper mentions simulating data and splitting it, but does not specify explicit training/validation/test splits with percentages or counts, or cross-validation details for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup No The paper describes the experimental design and setup for simulations and a case study, but does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs, optimizer settings) needed for reproduction.