IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation
Authors: Xiaoyun Li, Chenxi Wu, Ping Li4747-4754
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the effectiveness of the proposed feature selection scheme. We conduct extensive experiments on high dimensional datasets to justify the effectiveness of IVFS. The results suggest that IVFS is capable of preserving the exact topological signatures (as well as the pairwise distances), making it a suitable tool for TDA and many other applications. We compare each method by various widely adopted metrics that can well evaluate the quality of selected feature set. Table 1 summarizes the results. |
| Researcher Affiliation | Industry | Xiaoyun Li, Chengxi Wu, Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th ST. Bellevue WA, 98004, USA {lixiaoyun996, wuchenxi2013, pingli98}@gmail.com |
| Pseudocode | Yes | Algorithm 1: IVFS scheme for feature selection |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. |
| Open Datasets | Yes | Following many previous works on feature selection (e.g (Zhao and Liu 2007; Liu et al. 2014; Li et al. 2018b)), we carry out extensive experiments on popular highdimensional datasets from UCI repository (Asuncion and Newman 2007) and ASU feature selection database (Li et al. 2018a). |
| Dataset Splits | No | The paper specifies a train/test split: 'Each dataset is randomly split into 80% training sets and 20% test set'. However, it does not explicitly describe a validation set split or methodology. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory specifications, or types of computing environments used for the experiments. |
| Software Dependencies | No | The paper discusses various algorithms and methods implemented but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | We try following combinations: d = {0.1 : 0.1 : 0.5} d, n = {100, 0.1n, 0.3n, 0.5n}. We run experiments with k = 1000, 3000, 5000. For number of neighbors, we adopt K {1, 3, 5, 10} and report the highest mean accuracy. |