Kernel Feature Selection via Conditional Covariance Minimization
Authors: Jianbo Chen, Mitchell Stern, Martin J. Wainwright, Michael I. Jordan
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove various consistency results for this procedure, and also demonstrate that our method compares favorably with other state-of-the-art algorithms on a variety of synthetic and real data sets. |
| Researcher Affiliation | Academia | Jianbo Chen University of California, Berkeley jianbochen@berkeley.edu Mitchell Stern University of California, Berkeley mitchell@berkeley.edu Martin J. Wainwright University of California, Berkeley wainwrig@berkeley.edu Michael I. Jordan University of California, Berkeley jordan@berkeley.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for our approach is publicly available at https://github.com/Jianbo-Lab/CCM. |
| Open Datasets | Yes | We carry out experiments on 12 standard benchmark tasks from the ASU feature selection website [17] and the UCI repository [18]. |
| Dataset Splits | Yes | Performance is then measured by training a kernel SVM on the top m features and computing the resulting accuracy as measured by 5-fold cross-validation. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, or cloud computing specifications) are provided for running experiments. |
| Software Dependencies | No | The paper mentions using "Scikit-learn [20] and Scikit-feature [17] packages" and the "author’s implementation for BAHSIC2" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For kernel-based methods, we use a Gaussian kernel k(x, x) = exp( kx xk2/(2σ2)) on X and a linear kernel k(y, y) = y T y on Y . We take σ to be the median pairwise distance between samples scaled by 1/√2. We use " = 0.001 for the classification tasks and " = 0.1 for the regression task, selecting these values from {0.001, 0.01, 0.1} using cross-validation. For our own algorithm, we fix " = 0.001 across all experiments and set the number of desired features to m = 100 if d > 100 or dd/5e otherwise. In all cases we fix the regularization constant of the SVM to C = 1. |