Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection

Authors: Joonas Hämäläinen, Alisson S. C. Alencar, Tommi Kärkkäinen, César L. C. Mattos, Amauri H. Souza Júnior, João P. P. Gomes

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we detail the theoretical aspects that assure the MLM s interpolation and universal approximation capabilities, which had previously only been empirically verified. Second, we identify the major importance of the task of selecting reference points for the MLM s generalization capability. Several clustering-based methods for reference point selection in regression scenarios are then proposed and analyzed. Based on an extensive empirical evaluation, we conclude that the evaluated methods are both scalable and useful. Specifically, for a small number of reference points, the clustering-based methods outperform the standard random selection of the original MLM formulation. We validated this paper s empirical contributions through computational experiments with 15 regression data sets.
Researcher Affiliation Academia University of Jyvaskyla, Faculty of Information Technology P.O. Box 35, FI-40014 University of Jyvaskyla, Finland; Federal University of Cear a UFC, Department of Computer Science Fortaleza-CE, Brazil; Federal Institute of Education, Science and Technology of Cear a IFCE Department of Computer Science, Maracana u-CE, Brazil
Pseudocode Yes Algorithm 1 MLM output prediction with LLS Input: input x, distance regression model B, reference points R and T . Output: predicted output y.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the methodology, nor does it provide a direct link to a code repository.
Open Datasets Yes The original S1 data set is available at http://cs.uef.fi/sipu/datasets/. The remaining data sets are available at http://www.dcc.fc.up.pt/~ltorgo/Regression/Data Sets.html and at http://archive.ics.uci.edu/ml/index.php.
Dataset Splits Yes We divided the original data sets into train-validation-test sets and performed cross-validation (see, e.g., Friedman et al., 2001, Chapter 7). More precisely, we used the 3-DOBSCV (Moreno-Torres et al., 2012) approach to divide each data set into a training set and a test set. Therefore, the test set was forced to approximate the same distribution as the training set, making the comparison more reliable if concept drift is not considered. Because we focused only on regression tasks, we used DOB-SCV as a one-class case (H am al ainen, 2018; H am al ainen and K arkk ainen, 2016). Moreover, we archived three training sets and three test sets for each data set, respectively, with sizes of 2/3 and 1/3 of the number of observations. In training, we used the 10-DOB-SCV approach to select the optimal number of reference points. Hence, 18/30 of the number of observations were used to train the model and 2/30 of the number of observations were used to compute the validation error.
Hardware Specification No The paper states, "All the experiments were conducted in a MATLAB environment," but does not specify any hardware details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper mentions utilizing "MATLAB s mldivide-function" and that experiments were conducted in a "MATLAB environment," but it does not specify any version numbers for MATLAB or other software dependencies.
Experiment Setup Yes In training, the number of reference points Krel varied in the range of [5, 100], with a step size of 5.