Clustering-Based Joint Feature Selection for Semantic Attribute Prediction

Authors: Lin Chen, Baoxin Li

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic dataset and multiple public benchmark datasets demonstrate that the proposed approach effectively captures the correlation among multiple attributes and significantly outperforms the state-of-the-art approaches.
Researcher Affiliation Academia Lin Chen and Baoxin Li Arizona State University, Tempe Arizona {lin.chen.cs, baoxin.li}@asu.edu
Pseudocode Yes Algorithm 1 Feature Selection Optimization; Algorithm 2 Cluster Assignment Estimation
Open Source Code No The paper does not provide concrete access to its own source code for the methodology described.
Open Datasets Yes We verify the feature selection capability on general multilabel datasets in this section. The experiment is conducted on 6 public benchmark feature selection datasets including one object image dataset COIL100 [COI, 1996], one handwritten digit image dataset USPS [Hull, 1994], one spoken letter speech dataset Isolet [Fanty and Cole, 1991], three face image dataset Yale B [Georghiades et al., 2001], ORL [Samaria and Harter, 1994] and PIX10P1. The statistics of the datasets are summarized in Table 2. PIX10P is publicly available from https://featureselection.asu.edu/datasets.php
Dataset Splits Yes In each setup 50% samples are randomly selected for training and the remaining is for testing. Specific constrains are imposed to make sure the class labels of the training set are balanced. The whole experiment is conducted 10 rounds and average accuracies are reported. For SUN dataset, 60% of categories are randomly split out as seen categories in each round with the rest as unseen categories. During training 50% of samples are randomly and carefully drawn from each seen categories to ensure the balance of the positive and negative attribute labels.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers needed to replicate the experiment.
Experiment Setup Yes The proposed framework has three important parameters controlling the sparsity of W, β controlling the contribution of modeling label correlation and gamma controls the global penalty. We study the effect of each parameter by fixing the other to see how the performance of the proposed approach varies with the number of selected features. Figure 3 demonstrates the performance variance w.r.t. different parameters and the number of selected features.