Fractal Autoencoders for Feature Selection

Authors: Xinxing Wu, Qiang Cheng10370-10378

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental extensive experimental results on fourteen datasets, including very high-dimensional data, have demonstrated the superiority of FAE over existing contemporary methods for unsupervised feature selection. We validate FAE with extensive experiments on fourteen real datasets. Although simple, it demonstrates state-of-the-art performance for reconstruction on many benchmarking datasets. It also yields superior performance in a downstream learning task of classification on most of the benchmarking datasets.
Researcher Affiliation Academia University of Kentucky, Lexington, Kentucky, USA xinxingwu@gmail.com, qiang.cheng@uky.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks, nor any clearly labeled algorithm sections or code-like formatted procedures. It presents mathematical formulations.
Open Source Code Yes All experiments are implemented with Python 3.7.8, Tensorflow 1.14, and Keras 2.2.5. The codes can be found at https://github.com/xinxingwu-uk/FAE.
Open Datasets Yes The benchmarking datasets used in this paper are Mice Protein Expression1, COIL-20 (Nene, Nayar, and Murase 1996), Smartphone Dataset for Human Activity Recognition in Ambient Assisted Living (Anguita et al. 2013), ISOLET2, MNIST (Lecun et al. 1998), MNIST-Fashion (Xiao, Rasul, and Vollgraf 2017), GEO3, USPS, GLIOMA, leukemia, pixraw10P, Prostate GE, warp AR10P, SMK CAN 187, and arcene4. 1http://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression 2http://archive.ics.uci.edu/ml/datasets/ISOLET 3https://cbcl.ics.uci.edu/public_data/D-GEX 4The last eight datasets are from the scikit-feature feature selection repository (Li et al. 2017).
Dataset Splits Yes For MNIST and MNIST-Fashion, we randomly choose 6, 000 samples from each training set to train and validate and 4, 000 from each testing set for testing. And we randomly split 6, 000 samples into training and validation sets at a ratio of 90 : 10. For GEO, we randomly split the preprocessed GEO in the same way as D-GEX (Chen et al. 2016): 88, 807 for training, 11, 101 for validating, and 11, 101 for testing5. For other datasets, we randomly split them into training, validation, and testing sets by a ratio of 72 : 8 : 20.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions the software environment.
Software Dependencies Yes All experiments are implemented with Python 3.7.8, Tensorflow 1.14, and Keras 2.2.5.
Experiment Setup Yes In experiments of FAE, we set the maximum number of epochs to be 1, 000 for datasets 1-14 and 200 for dataset 15. We initialize the weights of feature selection layer by sampling uniformly from U[0.999999, 0.9999999] and the other layers with the Xavier normal initializer. We adopt the Adam optimizer (Kingma and Ba 2015) with an initialized learning rate of 0.001. We set λ1 and λ2 in (3) to 2 and 0.1, respectively. For the hyper-parameter setting, we perform a grid search on the validation set, and then choose the optimal one.