Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data
Authors: Ganggang Xu, Zuofeng Shang, Guang Cheng
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct a simulation study to illustrate the effectiveness of d GCV(λ) in choosing the optimal λ for the divide-and-conquer function estimator. (Section 5), In this section, we applied the d GCV tuning method to the Million Song Dataset, which consist of 463, 715 training examples and 51, 630 testing examples. (Section 6) |
| Researcher Affiliation | Academia | Ganggang Xu 1Department of Mathematical Sciences, Binghamton University, the State University of New York, Binghamton, NY, USA 2Department of Mathematical Sciences, IUPUI, Indianapolis, IN, USA 3Department of Statistics, Purdue University, West Lafayette, IN, USA |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block is present. |
| Open Source Code | No | The paper does not provide concrete access to source code or explicitly state its availability. |
| Open Datasets | Yes | The Million Song Dataset, which consist of 463, 715 training examples and 51, 630 testing examples. ... We refer to Bertin-Mahieux et al. (2011) for more details on this data set. (Section 6) |
| Dataset Splits | No | The paper describes random partitioning into sub-datasets for the divide-and-conquer strategy and specifies training/testing sizes for the Million Song Dataset, but does not explicitly mention or detail a validation split with specific percentages or counts. |
| Hardware Specification | Yes | All simulation runs were carried out in the software R on a cluster of 100 Linux machines with a total of 100 CPU cores, with each core running at approximately 2 GFLOPS. (Section 5), The experiment was conducted in Matlab using a Windows computer with 16GB of memory and a single-threaded 3.5Ghz CPU. (Section 6) |
| Software Dependencies | No | The paper mentions 'software R' and 'Matlab' but does not specify their version numbers or any other software dependencies with version numbers. |
| Experiment Setup | Yes | In all simulation runs, the tuning parameter λ was selected by a grid search for log(λ) over 30 equally-spaced grid points over the interval [−12, 1]. (Section 5), To find the best combination of (λ, φ) for each partition m, we perform a 2-dimensional search with λ ∈ {0.25, 0.5, 0.75, 1.0, 1.25, 1.5}/N and φ ∈ {2, 3, 4, 5, 6, 7} by minimizing (15) with K = m/10. (Section 6) |