Generalization error of spectral algorithms
Authors: Maksim Velikanov, Maxim Panov, Dmitry Yarotsky
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The figure contains 3 types of data that are computed in different ways. The first type is scatter plot markers and corresponds to the estimation of generalization loss via direct simulation. For Wishart and Cosine Wishart (see Section F.3) models, this amounts to sampling empirical kernel matrix K and observation vector y, calculating the generalization error for the resulting sampled realization, and finally averaging the result over n = 100 repetitions of the above procedure to estimate the expectation over training dataset DN in (3). |
| Researcher Affiliation | Collaboration | Maksim Velikanov1,2, Maxim Panov3, Dmitry Yarotsky4 1Technology Innovation Institute, 2Ecole Polytechnique, 3MBZUAI, 4Skoltech maksim.velikanov@tii.ae, maxim.panov@mbzuai.ac.ae, d.yarotsky@skoltech.ru |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. The derivations are presented mathematically. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper defines and uses custom data models (Wishart model, Circle model) and mentions simulating data. However, it does not refer to any pre-existing, publicly available datasets with concrete access information (like a link, DOI, or formal citation). |
| Dataset Splits | No | The paper does not mention using a 'validation' set or specify any training/validation/test splits of a dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or specific computing environments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers required for reproducibility. |
| Experiment Setup | Yes | Let us start by describing the experiment setting and details. Both KRR and GF plots use optimally scaled regularization η and time t, as derived in Section C. For all three data models, we consider ideal power-law population spectrum: λl = l ν, c2 l = l κ 1 (truncated at P = 4 104 due to computational limitations), and an adapted version λl = (2(|l|+1)) ν, |cl|2 = (2(|l|+1)) κ 1, l Z for Circle model. |