reproducibilityindex.ai

Sobolev Independence Criterion

Authors: Youssef Mroueh, Tom Sercu, Mattia Rigotti, Inkit Padhi, Cicero Nogueira dos Santos

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments validating SIC for feature selection in synthetic and real-world experiments. We show that SIC enables reliable and interpretable discoveries, when used in conjunction with the holdout randomization test and knockoffs to control the False Discovery Rate. Code is available at http://github.com/ibm/sic. and 8 Experiments Synthetic Data Validation. We ﬁrst validate our methods and compare them to baseline models in simulation studies on synthetic datasets where the ground truth is available by construction.
Researcher Affiliation	Collaboration	Youssef Mroueh, Tom Sercu, Mattia Rigotti, Inkit Padhi, Cicero Dos Santos IBM Research & MIT-IBM Watson AI lab mroueh,mrigotti@us.ibm.com,inkit.padhi@ibm.com
Pseudocode	Yes	Algorithm 3 in Appendix B summarizes our stochastic BCD algorithm for training the Neural SIC. The algorithm consists of SGD updates to and mirror descent updates to . and The principle in HRT [8] that we specify here for SIC in Algorithm 4 (given in Appendix B) is the following: instead of reﬁtting SIC under H0, we evaluate the mean of the witness function of SIC on a holdout set from the real distribution gives us p-values.
Open Source Code	Yes	Code is available at http://github.com/ibm/sic.
Open Datasets	Yes	We experiment with two datasets: A) Complex multivariate synthetic data (Sin Exp)... B) Liang Dataset. We show results on the benchmark dataset proposed by [34]... We consider as a real-world application the Cancer Cell Line Encyclopedia (CCLE) dataset [36]... The second real-world dataset that we analyze is the HIV-1 Drug Resistance[38]...
Dataset Splits	Yes	Table 1 shows the heldout MSE of a predictor trained on selected features, averaged over 100 runs (each run: new randomized 90%/10% data split, NN initialization).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies	No	The paper mentions software like 'scikit-learn' and 'PyTorch' but does not specify their version numbers, which are necessary for full reproducibility.
Experiment Setup	Yes	We train all neural networks used in this work via the Adam optimizer [44] with a learning rate of 1e-4 for 25 epochs. We use PyTorch [45] for all neural network implementations. and We use Boosted SIC, by varying the batch sizes in N 2 {10, 30, 50}, and computing the geometric mean of produced by those three setups as the feature importance needed for Knockoffs.