reproducibilityindex.ai

Variational Information Maximization for Feature Selection

Authors: Shuyang Gao, Greg Ver Steeg, Aram Galstyan

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that the proposed method strongly outperforms existing information-theoretic feature selection approaches. Our experiments demonstrate that the proposed method strongly outperforms existing information-theoretic feature selection approaches. We also conduct empirical validation on various datasets and demonstrate that the proposed approach outperforms state-of-the-art information-theoretic feature selection methods.
Researcher Affiliation	Academia	Shuyang Gao Greg Ver Steeg Aram Galstyan University of Southern California, Information Sciences Institute gaos@usc.edu, gregv@isi.edu, galstyan@isi.edu
Pseudocode	No	The paper mentions that
Open Source Code	Yes	Shuyang Gao. Variational feature selection code. http://github.com/Biu Biu Bi LL/ Info Feature Selection.
Open Datasets	Yes	We use 17 well-known datasets in previous feature selection studies [5, 12] (all data are discretized). The dataset summaries are illustrated in supplementary Sec. C. We use the average cross-validation error rate on the range of 10 to 100 features to compare different algorithms under the same setting as [12]. Tenfold cross-validation is employed for datasets with number of samples N 100 and leave-one-out cross-validation otherwise. The 3-nearest-neighbor classiﬁer is used for Gisette and Madelon, following [5]. For the remaining datasets, the chosen classiﬁer is Linear SVM, following [11, 12]. [26] Kevin Bache and Moshe Lichman. Uci machine learning repository, 2013.
Dataset Splits	Yes	Tenfold cross-validation is employed for datasets with number of samples N 100 and leave-one-out cross-validation otherwise. The 3-nearest-neighbor classiﬁer is used for Gisette and Madelon, following [5]. For the remaining datasets, the chosen classiﬁer is Linear SVM, following [11, 12].
Hardware Specification	No	No specific hardware details (GPU, CPU models, memory, etc.) used for running experiments were mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers were provided.
Experiment Setup	Yes	We use the average cross-validation error rate on the range of 10 to 100 features to compare different algorithms under the same setting as [12]. Tenfold cross-validation is employed for datasets with number of samples N 100 and leave-one-out cross-validation otherwise. The 3-nearest-neighbor classiﬁer is used for Gisette and Madelon, following [5]. For the remaining datasets, the chosen classiﬁer is Linear SVM, following [11, 12].