Feature selection in functional data classification with recursive maxima hunting

Authors: José L. Torrecilla, Alberto Suárez

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results of an extensive empirical evaluation are used to illustrate that, in the problems investigated, RMH has comparable or higher predictive accuracy than standard dimensionality reduction techniques, such as PCA and PLS, and state-of-the-art feature selection methods for functional data, such as maxima hunting.
Researcher Affiliation Academia Jos e L. Torrecilla Computer Science Department Universidad Aut onoma de Madrid 28049 Madrid, Spain joseluis.torrecilla@uam.es Alberto Su arez Computer Science Department Universidad Aut onoma de Madrid 28049 Madrid, Spain alberto.suarez@uam.es
Pseudocode Yes Algorithm 1 Recursive Maxima Hunting 1: function RMH(X(t), Y ) 2: t [ ] Vector of selected points initially empty 3: RMH rec(X(t),Y ,0,1) Recursive search of the maxima of R2(X(t), Y ) 4: return t Vector of selected points 5: end function 6: procedure RMH REC(X(t), Y, tinf, tsup) 7: tmax argmax tinf t tsup R2(X(t), Y ) 8: if R2(X(tmax), Y ) > s then 9: t [t tmax] Include tmax in t the vector of selected points 10: X(t) X(t) E(X(t) | X(tmax)), t [tinf, tsup] Correction of type (5) or (6) as required 11: else 12: return 13: end if 14: Exclude redundant points to the left of tmax 15: t max max tinf t<tmax t : R2 (X(tmax), X(t)) r 16: if t max > tinf then 17: RMH rec(X(t), Y, tinf, t max) Recursion on left subinterval 18: end if 19: Exclude redundant points to the right of tmax 20: t+ max min tmax<t tsup t : R2 (X(tmax), X(t)) r 21: if t+ max < tsup then 22: RMH rec(X(t), Y, t+ max, tsup) Recursion on right subinterval 23: end if 24: return 25: end procedure
Open Source Code No No explicit statement or link providing concrete access to the source code for the described methodology was found.
Open Datasets Yes To assess the performance of RMH in real-world functional classification problems, we have carried out a second batch of experiments in four datasets, which are commonly used as benchmarks in the FDA literature. Instances in Growth correspond to curves of the heights of 54 girls and 38 boys from the Berkeley Growth Study. Observations are discretized in 31 non-equidistant ages between 1 and 18 years (Ramsay and Silverman, 2005; Mosler and Mozharovskyi, 2014). The Tecator dataset consists of 215 near-infrared absorbance spectra of finely chopped meat. The spectral curves consist of 100 equally spaced points. The class labels are determined in terms of fat content (above or below 20%). The curves are fairly smooth. In consequence, we have followed the general recommendation and used the second derivative for classification (Ferraty and Vieu, 2006; Galeano et al., 2014). The Phoneme data consists of 4509 log-periodograms observed at 256 equidistant points. Here, we consider the binary problem of distinguishing between the phonemes aa (695) and ao (1022) (Galeano et al., 2014). Following Delaigle and Hall (2012a), the curves are smoothed with a local linear method and truncated to the first 50 variables. The Medflies are records of daily egg-laying patterns of a thousand flies. The goal is to discriminate between shortand long-lived flies. Following Mosler and Mozharovskyi (2014), curves equal to zero are excluded. There are 512 30-day curves (starting from day 5) of flies who live at most 34 days, 266 of these are long-lived (reach the day 44). The classes in Growth and Tecator are well separated. In consequence, they are relatively easy problems. By contrast, Phoneme and Medflies are notoriously difficult classification tasks.
Dataset Splits Yes The value k in k NN is selected by 10-fold CV from integers in [1, Ntrain], where Ntrain is the size of the training set. The value of s is selected from the set {0.025, 0.05, 0.1} by 10-fold CV. In the other methods, the number of selected variables is determined using 10-fold CV, with maximum of 30. In these experiments, training samples of different sizes (Ntrain = {50, 100, 200, 500, 1000}) and an independent test set of size 1000 are generated. To estimate the classification error, the datasets are partitioned at random into a training set (with 2/3 of the observations) and a test set (1/3).
Hardware Specification No No specific hardware details (like GPU/CPU models or specific computer configurations) used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions software components and methods like 'k-nearest neighbors (k NN)' and 'linear Fisher discriminant', but no specific version numbers for any software dependencies are provided, which are necessary for reproducible descriptions.
Experiment Setup Yes RMH requires determining the values of two hyperparameters: the redundancy threshold r (0 < r < 1 typically close to 1), and the relevance threshold s (0 < s < 1 typically close to 0). Through extensive simulations we have observed that RMH is quite robust for a wide range of appropriate values of these parameters. In particular, the results are very similar for values of r in the interval [0.75, 0.95]. The predictive accuracy is somewhat more sensitive to the choice of s: If the value of s is too small, irrelevant variables can be selected. If s is too large, it is possible that relevant points are excluded. For most of the experiments performed, the optimal values of s are between 0.025 and 0.1. In view of these observations, the experiments are made using r = 0.8. The value of s is selected from the set {0.025, 0.05, 0.1} by 10-fold CV. A more careful determination of r and s is beneficial, especially in some extreme problems (e.g., with very smooth or with rapidly-varying trajectories). In RMH, the number of selected variables, which is not determined beforehand, depends indirectly on the values of r and s. In the other methods, the number of selected variables is determined using 10-fold CV, with maximum of 30.