Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning with Exact Invariances in Polynomial Time

Authors: Ashkan Soleymani, Behrooz Tahmasebi, Stefanie Jegelka, Patrick Jaillet

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide complementary experiments to support our theoretical results. We first show that, in practice, Kernel Ridge Regression (KRR) is not a G-invariant estimator. Then, we demonstrate that our algorithm (Spec-Avg) achieves the same rate of population risk as KRR, while enjoying exact invariance properties. The results of the experiments are depicted in Figure 1 and Figure 2 in Appendix C.
Researcher Affiliation	Academia	1MIT EECS and MIT LIDS 2MIT EECS and MIT CSAIL 3School of CIT, MCML, and MDSI, Technical University of Munich (TUM).
Pseudocode	Yes	Pseudocode for the method is presented in Algorithm 1.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	No	The paper describes a self-generated dataset based on a target function and uniform sampling from Td = [-1, 1)^d, but it does not refer to a publicly available or open dataset with access information (link, DOI, citation to a dataset paper).
Dataset Splits	No	The trained models are evaluated on a test dataset of size 100. Both the test and train datasets are generated uniformly from the interval [ -1, 1]d, independently and identically distributed.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU, GPU models, or cloud computing instance types used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We conduct our experiments for d = 10. The trained models are evaluated on a test dataset of size 100. Both the test and train datasets are generated uniformly from the interval [ -1, 1]d, independently and identically distributed. Each point in our plots represents an average over 10 different random seeds (from 1 to 10) to account for the randomness in the data generation process. ... In Figure 2 in Appendix C, we present the empirical excess population risk of KRR and Spec-Avg for different hyperparameters λ and D, respectively. ... It can be observed that Spec-Avg with D = 176 achieves the same order of performance as KRR with λ = 50.