Sobolev Independence Criterion

Authors: Youssef Mroueh, Tom Sercu, Mattia Rigotti, Inkit Padhi, Cicero Nogueira dos Santos

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments validating SIC for feature selection in synthetic and real-world experiments. We show that SIC enables reliable and interpretable discoveries, when used in conjunction with the holdout randomization test and knockoffs to control the False Discovery Rate. Code is available at http://github.com/ibm/sic. and 8 Experiments Synthetic Data Validation. We first validate our methods and compare them to baseline models in simulation studies on synthetic datasets where the ground truth is available by construction.
Researcher Affiliation Collaboration Youssef Mroueh, Tom Sercu, Mattia Rigotti, Inkit Padhi, Cicero Dos Santos IBM Research & MIT-IBM Watson AI lab mroueh,mrigotti@us.ibm.com,inkit.padhi@ibm.com
Pseudocode Yes Algorithm 3 in Appendix B summarizes our stochastic BCD algorithm for training the Neural SIC. The algorithm consists of SGD updates to and mirror descent updates to . and The principle in HRT [8] that we specify here for SIC in Algorithm 4 (given in Appendix B) is the following: instead of refitting SIC under H0, we evaluate the mean of the witness function of SIC on a holdout set from the real distribution gives us p-values.
Open Source Code Yes Code is available at http://github.com/ibm/sic.
Open Datasets Yes We experiment with two datasets: A) Complex multivariate synthetic data (Sin Exp)... B) Liang Dataset. We show results on the benchmark dataset proposed by [34]... We consider as a real-world application the Cancer Cell Line Encyclopedia (CCLE) dataset [36]... The second real-world dataset that we analyze is the HIV-1 Drug Resistance[38]...
Dataset Splits Yes Table 1 shows the heldout MSE of a predictor trained on selected features, averaged over 100 runs (each run: new randomized 90%/10% data split, NN initialization).
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies No The paper mentions software like 'scikit-learn' and 'PyTorch' but does not specify their version numbers, which are necessary for full reproducibility.
Experiment Setup Yes We train all neural networks used in this work via the Adam optimizer [44] with a learning rate of 1e-4 for 25 epochs. We use PyTorch [45] for all neural network implementations. and We use Boosted SIC, by varying the batch sizes in N 2 {10, 30, 50}, and computing the geometric mean of produced by those three setups as the feature importance needed for Knockoffs.