Consensus Guided Unsupervised Feature Selection
Authors: Hongfu Liu, Ming Shao, Yun Fu
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several realworld data sets demonstrate that our methods are superior to the most recent state-of-the-art works in terms of NMI. |
| Researcher Affiliation | Academia | Hongfu Liu1, Ming Shao1, Yun Fu1,2 1Department of Electrical and Computer Engineering, Northeastern University, Boston 2College of Computer and Information Science, Northeastern University, Boston liu.hongf@husky.neu.edu, mingshao@ece.neu.edu, yunfu@ece.neu.edu |
| Pseudocode | Yes | Algorithm 1 UFS with Utility Function Require: X: the data matrix; B: the matrix concatenating of r basic partitions; α, β: the trade-off parameters. 1: Initialize H , Z and F; 2: repeat 3: Build the matrix U = αB XZ ; 4: Run K-means on U to update H and G; 5: Update Z = (X X + βF) 1X H G; 6: Update F according to Z; 7: until The objective function value of Eq. 5 remains unchanged. Ensure: Sort all m features according to ||zi||2 and select the certain number of ranked features with large values. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It mentions a URL for theorem proofs, but not for the implementation code itself. |
| Open Datasets | Yes | Data sets. Eight public data sets are used to evaluate the performance of CGUFS including 4 image data sets and 4 text data sets. Table 1 summarizes some important characteristics of these 8 benchmark data sets. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions using k-means to 'validate the performance' but without specifying dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Parameter setting. For LS, MCFA and NDFS, the number of neighbors is set to be 5 for the Laplacian graph. In the CGUFS framework, we employ Random Parameter Selection strategy to generate basic partitions. Generally speaking, k-means is conducted on all features with different cluster numbers from K to n; for ORL and Yale data sets, the range of the numbers of clusters varies in [2, 2K], where K is the true cluster number. 100 basic partitions are produced for robustness. Here we set the cluster structural parameter α = 104 and the sparse regularization parameter β = 1. The numbers of selected features vary from 50 to 300 with 50 as the interval, then k-means is used on the selected features to validate the performance. Each algorithm runs 20 times, and the average result and standard deviation are reported. |