Feature Selection at the Discrete Limit

Authors: Miao Zhang, Chris Ding, Ya Zhang, Feiping Nie

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on real life datasets show that features selected at small p consistently outperform features selected at p = 1, the standard L2,1 approach and other popular feature selection methods. To validate the performance of our L2,p feature selection method, we apply it on three data sets: DNA dataset, which belongs to the Statlog collection and used in (Hsu and Lin 2002), and two publicly available microarray datasets: the small round blue cell tumors (SRBCT) dataset (Khan et al. 2001) and the malignant glioma (GLIOMA) dataset (Nutt et al. 2003).
Researcher Affiliation Academia 1University of Texas at Arlington, Texas, USA, 76019 2Shanghai Jiao Tong University, Shanghai, China, 200240
Pseudocode Yes Algorithm 1 Rank-one Update Algorithm Input: X, Y, W0 parameters λ, p in L2,p norm Output: W Procedure: 1: W = W0 2: while W not converged do 3: for r = 1 to d do 4: Yr = Y i r wixi T 5: b = YT r xr/ xr 2 6: β = λ/(2 xr 2) 7: wr arg minwr 1 2 wr b 2 + β wr p 8: switch p 9: case 1: p = 1, standard L2,1 norm 10: case 2: p = 0.5, solve wr using Eq.(23) 11: case 3: p = 0, solve wr using Eq.(20) 12: case 4: 0 < p < 1, p 0.5, solve wr using Eq.(25) 13: end for 14: end while 15: Output W
Open Source Code No The paper does not provide any explicit statements about releasing the source code for the described methodology or links to a code repository.
Open Datasets Yes To validate the performance of our L2,p feature selection method, we apply it on three data sets: DNA dataset, which belongs to the Statlog collection and used in (Hsu and Lin 2002), and two publicly available microarray datasets: the small round blue cell tumors (SRBCT) dataset (Khan et al. 2001) and the malignant glioma (GLIOMA) dataset (Nutt et al. 2003).
Dataset Splits Yes For each dataset, we randomly split the dataset X into training set and testing set equally, ie, 50% of the data is training and 50% is testing. To get good statistics, we rerun these split process 20 times so splits are different from each run. And for each run, we adopt cross validation to ensure the fairness of the evaluation.
Hardware Specification No The paper discusses running time comparison but does not provide any specific hardware details such as GPU/CPU models or other machine specifications used for experiments.
Software Dependencies No The paper mentions using 'linear regression as our classifier' but does not specify any software libraries or dependencies with version numbers.
Experiment Setup Yes For each p value, we adjust λ such that the number of nonzero rows in W (optimal solution) is q. We initialize W in Eq.(7) using two methods: (1) ridge regression, i.e., replace the L2,p norm in Eq.(3) with Frobenius norm. This gives closed form solution for W. (2) global solution of W at p = 1. With 20 iterations, the solution is accurate with an error less than 10 14, close to the machine precision.