reproducibilityindex.ai

Post-selection inference with HSIC-Lasso

Authors: Tobias Freidling, Benjamin Poignard, Héctor Climente-González, Makoto Yamada

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performance of our method is illustrated by both artiﬁcial and real-world data based experiments, which emphasise a tight control of the type-I error, even for small sample sizes.
Researcher Affiliation	Academia	1Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, United Kingdom 2Graduate School of Economics, Osaka University, Osaka, Japan 3Center for Advanced Intelligence Project (AIP), RIKEN, Kyoto, Japan 4Graduate School of Informatics, Kyoto University, Kyoto, Japan.
Pseudocode	Yes	The supplementary material contains a more detailed description of the algorithm in pseudocode.
Open Source Code	Yes	The source code for the following experiments is available on Github: tobias-freidling/hsic-lasso-psi.
Open Datasets	Yes	Now we proceed to applying our proposed algorithm to benchmark datasets from the UCI Repository and the Broad Institute s Single Cell Portal, respectively.
Dataset Splits	Yes	Moreover, we use a quarter of the data for the ﬁrst fold, select the hyper-parameter λ applying 10-fold cross-validation with MSE, use a non-adaptive Lasso-penalty and do not conduct screening as the number of considered features is already small enough.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU specifications, memory) used for running experiments are provided.
Software Dependencies	No	The paper mentions software like Python, mskernel-package, and scikit-learn but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	For continuous data, we use Gaussian kernels where the bandwidth parameter is chosen according to the median heuristic, cf. (Sch olkopf & Smola, 2018); for discrete data with nc samples in category c, we apply the normalised delta kernel which is given by l(y, y ) := ( 1/nc, if y = y = c, 0, otherwise. Moreover, we use a quarter of the data for the ﬁrst fold, select the hyper-parameter λ applying 10-fold cross-validation with MSE, use a non-adaptive Lasso-penalty and do not conduct screening as the number of considered features is already small enough. On the second fold, we estimate M with the block estimator of size B = 10 as it is computationally less expensive than the unbiased estimator and leads to similar results. The covariance matrix Σ of H is estimated based on the summands of the block (3) and incomplete U-statistic (4) estimator, respectively. To this end, we use the oracle approximating shrinkage (OAS) estimator (2010), which was presented by Chen et al. and is particularly tailored for high-dimensional Gaussian data. We ﬁx the signiﬁcance level at α = 0.05 and simulate 100 datasets for each considered sample size.