Fast Randomized Kernel Ridge Regression with Statistical Guarantees
Authors: Ahmed Alaoui, Michael W. Mahoney
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We give an empirical evidence supporting this fact. Our second contribution is to present a fast algorithm to quickly compute coarse approximations to these scores in time linear in the number of samples. We test our results based on several datasets: one synthetic regression problem from [3] to illustrate the importance of the λ-ridge leverage scores, the Pumadyn family consisting of three datasets pumadyn-32fm, pumadyn-32fh and pumadyn-32nh 5 and the Gas Sensor Array Drift Dataset from the UCI database6. |
| Researcher Affiliation | Academia | Ahmed El Alaoui Michael W. Mahoney Electrical Engineering and Computer Sciences Statistics and International Computer Science Institute University of California, Berkeley, Berkeley, CA 94720. {elalaoui@eecs,mmahoney@stat}.berkeley.edu |
| Pseudocode | Yes | Inputs: data points (xi)1 i n, probability vector (pi)1 i n, sampling parameter p {1, 2, }, λ > 0, ϵ (0, 1/2). Output: ( li)1 i n ϵ-approximations to (li(λ))1 i n. 1. Sample p data points from (xi)1 i n with replacement with probabilities (pi)1 i n. 2. Compute the corresponding columns K1, , Kp of the kernel matrix. 3. Construct C = [K1, , Kp] Rn p and W Rp p as presented in Section 2. 4. Construct B Rn p such that BB = CW C . 5. For every i {1, , n}, set li = B i (B B + nλI) 1Bi (9) where Bi is the i-th row of B, and return it. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We test our results based on several datasets: one synthetic regression problem from [3] to illustrate the importance of the λ-ridge leverage scores, the Pumadyn family consisting of three datasets pumadyn-32fm, pumadyn-32fh and pumadyn-32nh 5 and the Gas Sensor Array Drift Dataset from the UCI database6. (5http://www.cs.toronto.edu/ delve/data/pumadyn/desc.html 6https://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset) |
| Dataset Splits | Yes | For all datasets, we determine λ and the band width of k by cross validation, and we compute the effective dimensionality deff and the maximal degrees of freedom dmof. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | For all datasets, we determine λ and the band width of k by cross validation, and we compute the effective dimensionality deff and the maximal degrees of freedom dmof. Table 1 summarizes the experiments. It is often the case that deff dmof and R( ˆf L)/R( ˆf K) 1, in agreement with Theorem 3. The synthetic case consists of a regression problem on the interval X = [0, 1] where, given a sequence (xi)1 i n and a sequence of noise (ϵi)1 i n, we observe the sequence yi = f(xi) + σ2ϵi, i {1, , n}. The number of observations is n = 500. |