reproducibilityindex.ai

Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection

Authors: Yufei Han, Yun Shen

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on different benchmark databases and a real-world cyber security application demonstrate the effectiveness of our algorithm.
Researcher Affiliation	Industry	Yufei Han and Yun Shen Symantec Research Labs Yufei Han@symantec.com, Yun Shen@symantec.com
Pseudocode	No	The paper describes the algorithm mathematically and in prose but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing the code for the methodology described, nor does it include a link to a code repository.
Open Datasets	Yes	We ﬁrst perform the experiments to verify the effectiveness of PUFS on three semi-supervised learning benchmark datasets1 USPS, COIL and G241[Chapelle et al., 2006]. 1They are available from http://www.kyb.tuebingen.mpg.de/sslbook.
Dataset Splits	Yes	We randomly select 80% of the entire data as training data, and the rest 20% as testing data. For each partition, all six feature selection algorithms are performed on the training data and select N best features. In the training data set, we choose randomly 10% of the positive training samples as labelled data P and treat the rest as unlabelled data U. This is designed to simulate the real world PU learning scenario, such as BGP hijacking events, where positively labelled samples are extremely limited. To evaluate feature subsets selected by different feature selection algorithms, a linear support vector machine (SVM), is built using 5-fold cross-validation on the test data set with these feature subsets.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions using a 'linear support vector machine (SVM)' but does not specify any software names with version numbers (e.g., specific libraries, frameworks, or programming languages with their versions) that would be needed for replication.
Experiment Setup	Yes	For NDFS, JELSR and PUFS, the size of neighbourhood (k) of KNN afﬁnity graph is speciﬁed to be 10 for all datasets. In the proposed PUFS, δ and γ in Eq.7 are ﬁxed at 103 and 105 for all datasets, providing consistent results. We determine , M and C in Eq.4 by grid search and ﬁnally ﬁx them as 1, 10 and 40 in the experiments respectively.