Consensus Guided Unsupervised Feature Selection

Authors: Hongfu Liu, Ming Shao, Yun Fu

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several realworld data sets demonstrate that our methods are superior to the most recent state-of-the-art works in terms of NMI.
Researcher Affiliation Academia Hongfu Liu1, Ming Shao1, Yun Fu1,2 1Department of Electrical and Computer Engineering, Northeastern University, Boston 2College of Computer and Information Science, Northeastern University, Boston liu.hongf@husky.neu.edu, mingshao@ece.neu.edu, yunfu@ece.neu.edu
Pseudocode Yes Algorithm 1 UFS with Utility Function Require: X: the data matrix; B: the matrix concatenating of r basic partitions; α, β: the trade-off parameters. 1: Initialize H , Z and F; 2: repeat 3: Build the matrix U = αB XZ ; 4: Run K-means on U to update H and G; 5: Update Z = (X X + βF) 1X H G; 6: Update F according to Z; 7: until The objective function value of Eq. 5 remains unchanged. Ensure: Sort all m features according to ||zi||2 and select the certain number of ranked features with large values.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It mentions a URL for theorem proofs, but not for the implementation code itself.
Open Datasets Yes Data sets. Eight public data sets are used to evaluate the performance of CGUFS including 4 image data sets and 4 text data sets. Table 1 summarizes some important characteristics of these 8 benchmark data sets.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions using k-means to 'validate the performance' but without specifying dataset splits.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Parameter setting. For LS, MCFA and NDFS, the number of neighbors is set to be 5 for the Laplacian graph. In the CGUFS framework, we employ Random Parameter Selection strategy to generate basic partitions. Generally speaking, k-means is conducted on all features with different cluster numbers from K to n; for ORL and Yale data sets, the range of the numbers of clusters varies in [2, 2K], where K is the true cluster number. 100 basic partitions are produced for robustness. Here we set the cluster structural parameter α = 104 and the sparse regularization parameter β = 1. The numbers of selected features vary from 50 to 300 with 50 as the interval, then k-means is used on the selected features to validate the performance. Each algorithm runs 20 times, and the average result and standard deviation are reported.