Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection
Authors: Joonas Hämäläinen, Alisson S. C. Alencar, Tommi Kärkkäinen, César L. C. Mattos, Amauri H. Souza Júnior, João P. P. Gomes
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we detail the theoretical aspects that assure the MLM s interpolation and universal approximation capabilities, which had previously only been empirically verified. Second, we identify the major importance of the task of selecting reference points for the MLM s generalization capability. Several clustering-based methods for reference point selection in regression scenarios are then proposed and analyzed. Based on an extensive empirical evaluation, we conclude that the evaluated methods are both scalable and useful. Specifically, for a small number of reference points, the clustering-based methods outperform the standard random selection of the original MLM formulation. We validated this paper s empirical contributions through computational experiments with 15 regression data sets. |
| Researcher Affiliation | Academia | University of Jyvaskyla, Faculty of Information Technology P.O. Box 35, FI-40014 University of Jyvaskyla, Finland; Federal University of Cear a UFC, Department of Computer Science Fortaleza-CE, Brazil; Federal Institute of Education, Science and Technology of Cear a IFCE Department of Computer Science, Maracana u-CE, Brazil |
| Pseudocode | Yes | Algorithm 1 MLM output prediction with LLS Input: input x, distance regression model B, reference points R and T . Output: predicted output y. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | The original S1 data set is available at http://cs.uef.fi/sipu/datasets/. The remaining data sets are available at http://www.dcc.fc.up.pt/~ltorgo/Regression/Data Sets.html and at http://archive.ics.uci.edu/ml/index.php. |
| Dataset Splits | Yes | We divided the original data sets into train-validation-test sets and performed cross-validation (see, e.g., Friedman et al., 2001, Chapter 7). More precisely, we used the 3-DOBSCV (Moreno-Torres et al., 2012) approach to divide each data set into a training set and a test set. Therefore, the test set was forced to approximate the same distribution as the training set, making the comparison more reliable if concept drift is not considered. Because we focused only on regression tasks, we used DOB-SCV as a one-class case (H am al ainen, 2018; H am al ainen and K arkk ainen, 2016). Moreover, we archived three training sets and three test sets for each data set, respectively, with sizes of 2/3 and 1/3 of the number of observations. In training, we used the 10-DOB-SCV approach to select the optimal number of reference points. Hence, 18/30 of the number of observations were used to train the model and 2/30 of the number of observations were used to compute the validation error. |
| Hardware Specification | No | The paper states, "All the experiments were conducted in a MATLAB environment," but does not specify any hardware details such as GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions utilizing "MATLAB s mldivide-function" and that experiments were conducted in a "MATLAB environment," but it does not specify any version numbers for MATLAB or other software dependencies. |
| Experiment Setup | Yes | In training, the number of reference points Krel varied in the range of [5, 100], with a step size of 5. |