Fractal Autoencoders for Feature Selection
Authors: Xinxing Wu, Qiang Cheng10370-10378
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive experimental results on fourteen datasets, including very high-dimensional data, have demonstrated the superiority of FAE over existing contemporary methods for unsupervised feature selection. We validate FAE with extensive experiments on fourteen real datasets. Although simple, it demonstrates state-of-the-art performance for reconstruction on many benchmarking datasets. It also yields superior performance in a downstream learning task of classification on most of the benchmarking datasets. |
| Researcher Affiliation | Academia | University of Kentucky, Lexington, Kentucky, USA xinxingwu@gmail.com, qiang.cheng@uky.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks, nor any clearly labeled algorithm sections or code-like formatted procedures. It presents mathematical formulations. |
| Open Source Code | Yes | All experiments are implemented with Python 3.7.8, Tensorflow 1.14, and Keras 2.2.5. The codes can be found at https://github.com/xinxingwu-uk/FAE. |
| Open Datasets | Yes | The benchmarking datasets used in this paper are Mice Protein Expression1, COIL-20 (Nene, Nayar, and Murase 1996), Smartphone Dataset for Human Activity Recognition in Ambient Assisted Living (Anguita et al. 2013), ISOLET2, MNIST (Lecun et al. 1998), MNIST-Fashion (Xiao, Rasul, and Vollgraf 2017), GEO3, USPS, GLIOMA, leukemia, pixraw10P, Prostate GE, warp AR10P, SMK CAN 187, and arcene4. 1http://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression 2http://archive.ics.uci.edu/ml/datasets/ISOLET 3https://cbcl.ics.uci.edu/public_data/D-GEX 4The last eight datasets are from the scikit-feature feature selection repository (Li et al. 2017). |
| Dataset Splits | Yes | For MNIST and MNIST-Fashion, we randomly choose 6, 000 samples from each training set to train and validate and 4, 000 from each testing set for testing. And we randomly split 6, 000 samples into training and validation sets at a ratio of 90 : 10. For GEO, we randomly split the preprocessed GEO in the same way as D-GEX (Chen et al. 2016): 88, 807 for training, 11, 101 for validating, and 11, 101 for testing5. For other datasets, we randomly split them into training, validation, and testing sets by a ratio of 72 : 8 : 20. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions the software environment. |
| Software Dependencies | Yes | All experiments are implemented with Python 3.7.8, Tensorflow 1.14, and Keras 2.2.5. |
| Experiment Setup | Yes | In experiments of FAE, we set the maximum number of epochs to be 1, 000 for datasets 1-14 and 200 for dataset 15. We initialize the weights of feature selection layer by sampling uniformly from U[0.999999, 0.9999999] and the other layers with the Xavier normal initializer. We adopt the Adam optimizer (Kingma and Ba 2015) with an initialized learning rate of 0.001. We set λ1 and λ2 in (3) to 2 and 0.1, respectively. For the hyper-parameter setting, we perform a grid search on the validation set, and then choose the optimal one. |