Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Unsupervised Personalized Feature Selection
Authors: Jundong Li, Liang Wu, Harsh Dani, Huan Liu
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on real-world datasets verify the effectiveness of the proposed UPFS framework. |
| Researcher Affiliation | Academia | Jundong Li, Liang Wu, Harsh Dani, Huan Liu Computer Science and Engineering, Arizona State University, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Unsupervised Personalized Feature Selection |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement) for its source code. |
| Open Datasets | Yes | We choose 9 datasets from various domains, including (1) four text datasets: CNNStory, Blog Catalog, Flickr and DBLP; (2) two image datasets: Yale and warp PIE10P; (3) three biology datasets: Carcinoma, Prostate GE and TOX171. Detailed statistics of the used datasets are shown in Table 1. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits by percentage or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | In Laplacian Score, MCFS, NDFS, FSASL and UPFS, we specify the number of nearest neighbors k as 5. It is still an open question to decide the optimal number of selected features in feature selection research. Thus, we set the number of selected features among {10, 20, ..., 300} and report the best clustering results. To study how its variation affects the feature selection performance, we fix two parameters each time and vary the third one in the range of {0.001, 0.01, 1, 10, 100, 1000}. |