Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalization error of spectral algorithms

Authors: Maksim Velikanov, Maxim Panov, Dmitry Yarotsky

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The figure contains 3 types of data that are computed in different ways. The first type is scatter plot markers and corresponds to the estimation of generalization loss via direct simulation. For Wishart and Cosine Wishart (see Section F.3) models, this amounts to sampling empirical kernel matrix K and observation vector y, calculating the generalization error for the resulting sampled realization, and finally averaging the result over n = 100 repetitions of the above procedure to estimate the expectation over training dataset DN in (3).
Researcher Affiliation	Collaboration	Maksim Velikanov1,2, Maxim Panov3, Dmitry Yarotsky4 1Technology Innovation Institute, 2Ecole Polytechnique, 3MBZUAI, 4Skoltech EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. The derivations are presented mathematically.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	No	The paper defines and uses custom data models (Wishart model, Circle model) and mentions simulating data. However, it does not refer to any pre-existing, publicly available datasets with concrete access information (like a link, DOI, or formal citation).
Dataset Splits	No	The paper does not mention using a 'validation' set or specify any training/validation/test splits of a dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or specific computing environments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers required for reproducibility.
Experiment Setup	Yes	Let us start by describing the experiment setting and details. Both KRR and GF plots use optimally scaled regularization η and time t, as derived in Section C. For all three data models, we consider ideal power-law population spectrum: λl = l ν, c2 l = l κ 1 (truncated at P = 4 104 due to computational limitations), and an adapted version λl = (2(\|l\|+1)) ν, \|cl\|2 = (2(\|l\|+1)) κ 1, l Z for Circle model.