Statistically Valid Variable Importance Assessment through Conditional Permutations
Authors: Ahmad CHAMMA, Denis A. Engemann, Bertrand Thirion
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments & Results, We conduct extensive benchmarks on synthetic and heterogeneous multimodal real-world biomedical data tapping into different correlation levels and data-generating scenarios for both classification and regression (section 5). |
| Researcher Affiliation | Collaboration | Ahmad Chamma Inria, Universite Paris Saclay, CEA ahmad.chamma@inria.fr Denis A. Engemann Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center Basel, F. Hoffmann La Roche Ltd., Basel, Switzerland denis.engemann@roche.com Bertrand Thirion Inria, Universite Paris Saclay, CEA bertrand.thirion@inria.fr |
| Pseudocode | Yes | Algorithm 1 Conditional sampling step: The algorithm implements the conditional sampling step in place of the permutation approach when computing the p-value of variable xj |
| Open Source Code | Yes | We propose a reusable library for simulation experiments and real-world applications of our method on a public Git Hub repo https://github.com/achamma723/Variable_ Importance. |
| Open Datasets | Yes | A recent real-world data analysis of the UK Biobank dataset reported successful machine learning analysis of individual characteristics. The UK Biobank project (UKBB) curates phenotypic and imaging data from a prospective cohort of volunteers drawn from the general population of the UK [Constantinescu et al., 2022]. Age prediction from brain activity (MEG) in Cam-CAN dataset Following the work of Engemann et al. [2020], we have applied CPI-DNN to the problem of age prediction from brain activity in different frequencies recorded with magnetoencephalography (MEG) in the Cam-CAN dataset. |
| Dataset Splits | Yes | Throughout the paper, we rely on an i.i.d. sampling train / test partition scheme where the n samples are divided into ntrain training and ntest test samples and our implementation involves a 2-fold internal validation (the training set of further split to get validation set for hyperparameter tuning). |
| Hardware Specification | No | The paper mentions 'per core on 100 cores' when discussing computation time, but it does not specify the type or model of CPU/GPU or any other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions general software components like deep neural networks and random forests, but it does not specify any software libraries with version numbers (e.g., Python, PyTorch, scikit-learn versions) required to replicate the experiments. |
| Experiment Setup | No | The paper states that 'hyperparameter tuning' and '2-fold internal validation' were used for models like Random Forests (e.g., 'the max depth of the Random Forest is chosen based on the performance with 2-fold cross validation'), but it does not explicitly provide the specific values for these hyperparameters or other system-level training settings in the main text. |