Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fast Kernel Methods for Generic Lipschitz Losses via $p$-Sparsified Sketches
Authors: Tamim El Ahmad, Pierre Laforgue, Florence d'Alché-Buc
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical results are complemented with experiments showing the empirical superiority of our approach over state-of-the-art sketching methods.4 Experiments We now empirically compare the performance of p-sparsified sketches against state-of-he-art approaches, namely Nyström approximation (Williams & Seeger, 2001), Gaussian sketch (Yang et al., 2017), Accumulation sketch (Chen & Yang, 2021a), Count Sketch (Clarkson & Woodruff, 2017) and Random Fourier Features (Rahimi & Recht, 2007). |
| Researcher Affiliation | Academia | Tamim El Ahmad EMAIL LTCI, Télécom Paris, IP Paris, France Pierre Laforgue EMAIL Department of Computer Science, University of Milan, Italy Florence d Alché-Buc EMAIL LTCI, Télécom Paris, IP Paris, France |
| Pseudocode | Yes | D Detailed algorithm of the generation and the decomposition of a p-sparsified sketch In this section, we detail the process of generating a p-sparsified sketch and decomposing it as a product of a sub-Gaussian sketch SSG and a sub-Sampling sketch SSS. Algorithm 1 Generation of a p-sparsified sketch input : s, n and p Generate a s n matrix B whose entries are i.i.d. Bernouilli random variables of parameter p. indices indices of non-null columns of B. B B where all null columns have been deleted. Generate a matrix MSG of the same size as B whose entries are either i.i.d. Gaussian or Rademacher random variables. SSG MSG B , where denotes the component-wise Hadamard matrix product. return SSG and indices |
| Open Source Code | No | No explicit statement about the release of open-source code for the methodology described in this paper or a link to a code repository was found. |
| Open Datasets | Yes | We showcase the behaviour of the proposed algorithm for Joint Sketched Quantile Regression on two datasets: the Boston Housing dataset (Harrison Jr & Rubinfeld, 1978), composed of 506 data points devoted to house price prediction, and the Fish Otoliths dataset (Moen et al., 2018; Ordoñez et al., 2020), dedicated to fish age prediction from images of otoliths (calcium carbonate structures), composed of a train and test sets of size 3780 and 165 respectively. The results are averages over 10 random 70% 30% train-test splits for Boston dataset. For the Otoliths dataset we kept the initial given train-test split. The results are reported in Table 1. Sketching allows for a massive reduction of the training times while preserving the statistical performances. As a comparison, according to the results of Sangnier et al. (2016), the best benchmark result for the Boston dataset in terms of test pinball loss is 47.4, while best test crossing loss is 0.48, which shows that our implementation does not compete in terms of quantile prediction but preserves the non-crossing property. To conduct these experiments, the train-test splits used are the ones available at http://mulan.sourceforge. net/datasets-mtr.html. Besides, we used multi-output Kernel Ridge Regression framework and an input Gaussian kernel and an operator M = Id. We selected regularisation parameter λn and bandwidth of kernel σ2 via a 5-fold cross-validation. Results are averages over 30 replicates for sketched models. |
| Dataset Splits | Yes | The results are averages over 10 random 70% 30% train-test splits for Boston dataset. For the Otoliths dataset we kept the initial given train-test split. The results are reported in Table 1. Sketching allows for a massive reduction of the training times while preserving the statistical performances. As a comparison, according to the results of Sangnier et al. (2016), the best benchmark result for the Boston dataset in terms of test pinball loss is 47.4, while best test crossing loss is 0.48, which shows that our implementation does not compete in terms of quantile prediction but preserves the non-crossing property. To conduct these experiments, the train-test splits used are the ones available at http://mulan.sourceforge. net/datasets-mtr.html. Besides, we used multi-output Kernel Ridge Regression framework and an input Gaussian kernel and an operator M = Id. We selected regularisation parameter λn and bandwidth of kernel σ2 via a 5-fold cross-validation. Results are averages over 30 replicates for sketched models. |
| Hardware Specification | No | No specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions using ADAM Stochastic Subgradient Descent (Kingma & Ba, 2015) and a subgradient algorithm, but does not provide specific software dependencies with version numbers (e.g., programming language version, library versions). |
| Experiment Setup | Yes | We use the Gaussian kernel and select its bandwidth as well as parameters λn and κ (and ϵ for ϵ-SVR) via 5-folds cross-validation. We solve this 1D regression problem using the κ-Huber loss, described in Appendix G. We learn the sketched kernel machines for different values of s (from 40 to 140) and several values of p, the probability of being non-null in a p-SR sketch. Figure 1(a) presents the test error as a function of the sketch size s. Figure 1(b) shows the corresponding computational training time. We apply a subgradient algorithm to minimize the pinball loss described in Appendix G with ridge regularization and a kernel K = k M with M discussed in Example 1, and k a Gaussian kernel. We select regularisation parameter λn and bandwidth of kernel σ2 via a 5-fold cross-validation. We used multi-output Kernel Ridge Regression framework and an input Gaussian kernel and an operator M = Id. We selected regularisation parameter λn and bandwidth of kernel σ2 via a 5-fold cross-validation. Results are averages over 30 replicates for sketched models. |