Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Theoretical Understanding of Learning Large-scale Dependent Data via Random Features

Authors: Chao Wang, Xin Bing, Xin He, Caixing Wang

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive numerical experiments on both synthetic and real examples further validate our theoretical findings and support the effectiveness of the KRR with RFs in dealing with dependent data.
Researcher Affiliation	Academia	1School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China 2Department of Statistical Sciences, University of Toronto.
Pseudocode	No	The paper provides mathematical formulations for its methods, such as the KRR estimator (Equation 2) and KRR-RF (Equation 6), but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The Python code for reproducing the numerical experiments is available in https://github.com/wangchao-afk/KRR-RF-DP.
Open Datasets	Yes	This dataset is available in https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data/data.
Dataset Splits	Yes	The parameter λ is chosen from {10 7, 10 6, 10 5, 10 4, 10 3} via cross-validation, and the performance of the estimator bf is evaluated by the predic- tion error that bf fρ m = q 1 m Pm i=1( bf(xi) fρ(xi))2 using a new test data of size m drawn from the specified model. ... Specifically, we chose the first 1462 samples from 2013 to 2016 as the training data and the remaining 114 samples in 2017 as the test data.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions that the code is in Python ('The Python code for reproducing the numerical experiments is available...'), but it does not specify any version numbers for Python itself or for any specific libraries or software dependencies used.
Experiment Setup	Yes	The parameter λ is chosen from {10 7, 10 6, 10 5, 10 4, 10 3} via cross-validation