Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection
Authors: Yufei Han, Yun Shen
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on different benchmark databases and a real-world cyber security application demonstrate the effectiveness of our algorithm. |
| Researcher Affiliation | Industry | Yufei Han and Yun Shen Symantec Research Labs Yufei Han@symantec.com, Yun Shen@symantec.com |
| Pseudocode | No | The paper describes the algorithm mathematically and in prose but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | Yes | We first perform the experiments to verify the effectiveness of PUFS on three semi-supervised learning benchmark datasets1 USPS, COIL and G241[Chapelle et al., 2006]. 1They are available from http://www.kyb.tuebingen.mpg.de/sslbook. |
| Dataset Splits | Yes | We randomly select 80% of the entire data as training data, and the rest 20% as testing data. For each partition, all six feature selection algorithms are performed on the training data and select N best features. In the training data set, we choose randomly 10% of the positive training samples as labelled data P and treat the rest as unlabelled data U. This is designed to simulate the real world PU learning scenario, such as BGP hijacking events, where positively labelled samples are extremely limited. To evaluate feature subsets selected by different feature selection algorithms, a linear support vector machine (SVM), is built using 5-fold cross-validation on the test data set with these feature subsets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using a 'linear support vector machine (SVM)' but does not specify any software names with version numbers (e.g., specific libraries, frameworks, or programming languages with their versions) that would be needed for replication. |
| Experiment Setup | Yes | For NDFS, JELSR and PUFS, the size of neighbourhood (k) of KNN affinity graph is specified to be 10 for all datasets. In the proposed PUFS, δ and γ in Eq.7 are fixed at 103 and 105 for all datasets, providing consistent results. We determine , M and C in Eq.4 by grid search and finally fix them as 1, 10 and 40 in the experiments respectively. |