Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An Importance Weighted Feature Selection Stability Measure

Authors: Victor Hamer, Pierre Dupont

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate, theoretically and experimentally, that current stability measures are subject to undesirable behaviors, for example, when they are jointly optimized with predictive accuracy. Results on micro-array and mass-spectrometric data show that our novel stability measure corrects for overly optimistic stability estimates in such a bi-objective context, which leads to improved decision-making. It is also shown to be less prone to the under-or over-estimation of the stability value in feature spaces with groups of highly correlated variables." and "In this experimental section, we study the behavior of the stability measures φ, φpears and φiw in the context of joint optimization with predictive accuracy (Section 7.1). We evaluate the stability of classical feature selection approaches according to these measures in Section 7.2 before brieﬂy comparing their sampling distributions in Section 7.3.
Researcher Affiliation	Academia	Victor Hamer EMAIL Pierre Dupont EMAIL UCLouvain ICTEAM/INGI/Machine Learning Group, Place Sainte-Barbe 2, B-1348 Louvain-la-Neuve, Belgium.
Pseudocode	Yes	Algorithm 1 Hybrid rfe. 1: procedure Select Features(N, λ, ϵ, λf) 2: F the set of all features 3: rf univariate criterion rank of each feature (descending order) 4: SN {f : rf N} 5: βf ϵ if f SN, 1 otherwise 6: while \|F\| > k do 7: w argminw Pn i=1 log(1 + exp yi(wxi)) + λ\|\|β w\|\|2 8: r rank features {f F \ SN} on \|w f\| in descending order 9: F F \ {f : r f = \|F\| N} 10: w argminw Pn i=1 log(1 + exp yi(wxi)) + λf\|\|w\|\|2 11: return (F, w )
Open Source Code	No	The paper does not provide explicit statements about open-sourcing the code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	The studied data sets are summarized in Table 3. They all have a small n (number of samples) to d (number of features) ratio, which generally causes feature selection methods to be particularly unstable. The learning task consists in predicting whether or not a patient is suﬀering from the corresponding disease. As is often done when dealing with high dimensional data sets, the feature space is ﬁrst pre-ﬁltered by removing the features with lowest variance (except for alon and gravier, for which such a pre-ﬁltering has already been performed). The amount of pre-ﬁltering is found such as to maximize the predictive performance of the classical rfe (N = 0) and is kept constant for all experiments. To measure the accuracy and stability obtained with a given set of meta-parameters, we use the classical bootstrap protocol which draws with replacement M samples of the same size as the original data set. Each model is evaluated on the out-of-bag examples and the mean classiﬁcation accuracy is reported. The selection stability is evaluated using Equation (1)(φ), (4)(φpears) and (6)(φiw), over the M = 100 resamplings. We perform experiments using the hybrid-rfe with the additional meta-parameter α, introduced in Section 6.2. The N pre-selected features are ranked according to the considered univariate criterion and are put, in that order, on top of the rfe ranking at each iteration, such that their frequency score scf given by Equation (15) is the highest. Increasing α is thus expected to increase the importance of these N pre-selected features in the predictive models, as they are less regularized (Equation 16). The table 'Table 3: Information on used data sets, from the UCI machine learning repository (arcene) and from the datamicroarray R package for the others.' indicates public availability.
Dataset Splits	Yes	To measure the accuracy and stability obtained with a given set of meta-parameters, we use the classical bootstrap protocol which draws with replacement M samples of the same size as the original data set. Each model is evaluated on the out-of-bag examples and the mean classiﬁcation accuracy is reported. The selection stability is evaluated using Equation (1)(φ), (4)(φpears) and (6)(φiw), over the M = 100 resamplings.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments. It mentions using 'deep neural networks' but no hardware specifications are given.
Software Dependencies	No	The paper mentions the use of 'mvrnorm R package' and 'CORElearn R package' but does not specify their version numbers. No other software dependencies with specific version numbers are provided.
Experiment Setup	Yes	We perform experiments using the hybrid-rfe with the additional meta-parameter α, introduced in Section 6.2. The N pre-selected features are ranked according to the considered univariate criterion and are put, in that order, on top of the rfe ranking at each iteration, such that their frequency score scf given by Equation (15) is the highest. Increasing α is thus expected to increase the importance of these N pre-selected features in the predictive models, as they are less regularized (Equation 16)." and "The regularization parameter λ of the lasso and group lasso such as to select approximately 40 features when the size c of the correlated groups is equal to 1." and "We aim at selecting min(20, d) features, while we set M to 30." and "The lasso, used in the context of logistic regression, finds the linear model w minimizing Pn i=1 log(1 + exp yi(wxi)) + λ\|\|w\|\|1." and "Random forests... Stability is clearly increased when the forest size grows before converging for T <= 1000." and "The relief algorithm... K nearest instances have equal weights."