Efficient High-Order Interaction-Aware Feature Selection Based on Conditional Mutual Information

Authors: Alexander Shishkin, Anastasia Bezzubtseva, Alexey Drutsa, Ilia Shishkov, Ekaterina Gladkikh, Gleb Gusev, Pavel Serdyukov

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The superiority of our approach is demonstrated by comparison with recently proposed interactionaware filters and several interaction-agnostic state-of-the-art ones on ten publicly available benchmark datasets. We also empirically validate our approach with 3 state-of-the-art classification models on 10 publicly available benchmark datasets and compare it with known interaction-aware SFS-based filters and several state-of-the-art ones.
Researcher Affiliation Industry Alexander Shishkin, Anastasia Bezzubtseva, Alexey Drutsa, Ilia Shishkov, Ekaterina Gladkikh, Gleb Gusev, Pavel Serdyukov Yandex; 16 Leo Tolstoy St., Moscow 119021, Russia {sisoid,nstbezz,adrutsa,ishfb,kglad,gleb57,pavser}@yandex-team.ru
Pseudocode Yes Algorithm 1 Pseudo-code of the CMICOT feature selection method (an implementation of this algorithm is available at https://github.com/yandex/CMICOT).
Open Source Code Yes Algorithm 1 Pseudo-code of the CMICOT feature selection method (an implementation of this algorithm is available at https://github.com/yandex/CMICOT).
Open Datasets Yes on 10 publicly available benchmark datasets from the UCI ML Repo (that include the NIPS 2003 FS competition)
Dataset Splits Yes The curves on Fig. 1 (b,c) are obtained over a test set, while a 10-fold cross-validation [2, 18] is also applied for several key points (e.g. k = 10, 20, 50) to estimate the significance of differences in classification quality.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions using Naive Bayes Classifier (NBC), k-Nearest Neighbor (k NN), and Ada Boost, but it does not provide specific version numbers for these or any other software libraries or dependencies.
Experiment Setup No The paper mentions general experimental parameters like k = 1..50 for feature selection and t = 1..10 for their method, and discusses preprocessing such as discretization. However, it does not provide specific hyperparameters (e.g., learning rate, batch size, optimizer settings) for the classifiers used (NBC, kNN, Ada Boost) or other detailed system-level training configurations in the main text.