reproducibilityindex.ai

Discovering Conditionally Salient Features with Statistical Guarantees

Authors: Jaime Roquero Gimenez, James Zou

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement this method and present an algorithm that automatically partitions the feature space such that it enhances the differences between selected sets in different regions, and validate the statistical theoretical results with experiments. (Abstract)Our main contributions of this paper are in laying out the new framework for conditional feature selection along with proposing a new knockoff algorithm with mathematical guarantees. We also validate the algorithm on experiments. (Our Contributions)We now run experiments and the ﬁrst goal is to show that our main theorem holds. (Section 4)
Researcher Affiliation	Academia	1Department of Statistics, Stanford University, Stanford USA 2Department of Biomedical Data Science, Stanford University, Stanford USA.
Pseudocode	Yes	Algorithm 1 Knockoffs with Local Importance Scores Feature Selection Procedure (Section 3.2)Algorithm 2 One-Step Greedy Feature Space Partition (Section 3.3)
Open Source Code	No	The paper does not contain any explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	No	We therefore sample the datasets X, X so that X simulates the SNPs of a cohort of patients, i.e. a matrix of 0, 1, and 2. (Section 4)The paper describes a synthetic data generation process but does not indicate that the dataset used for experiments is publicly available, nor does it provide access information (link, DOI, citation) for a specific public dataset.
Dataset Splits	No	We vary the size of the dataset to show that our method still controls local FDR even though the number of points in a given subregion is very limited (Section 4).The paper mentions varying dataset size but does not specify train/validation/test splits, absolute sample counts for each split, or reference predefined splits with citations.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using 'logistic regression' as part of the importance score calculation, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, scikit-learn versions).
Experiment Setup	No	Our target FDR is q = 0.2 and locally we consider as importance scores the absolute values of the coefﬁcients in a logistic regression. (Section 4)The paper provides some high-level experimental parameters like the target FDR level and the type of importance scores used, but it lacks specific hyperparameters (e.g., learning rate, batch size, number of epochs) or system-level training settings crucial for reproducibility.