reproducibilityindex.ai

Near-optimal rate of consistency for linear models with missing values

Authors: Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments highlight the beneﬁts of our method compared to state-of-the-art algorithms used for predictions with missing values. In this section, we numerically evaluate the performance of several regressors on varying missing data scenarios.
Researcher Affiliation	Academia	1Sorbonne Universit e, CNRS, Laboratoire de Probabilit es, Statistique et Mod elisation (LPSM), F-75005 Paris, France 2MOKAPLAN, INRIA Paris 3CMAP, UMR7641, Ecole Polytechnique, IP Paris, 91128 Palaiseau, France.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode blocks or algorithms in a structured format.
Open Source Code	Yes	The codes of our numerical experiments are all available on a github.com/Alexis Ayme/minimax_linear_na.
Open Datasets	Yes	We consider three different settings in dimension d = 8 with increasing difﬁculty: (a) MCAR Bernoulli... (b) MAR... (c) MNAR-GPMM... In order for the simulations to be reproducible, here are the useful parameters to generate the dataset of Section 5. ... We ran experiments on the real superconductivity dataset (d = 81)...
Dataset Splits	No	The paper mentions "Number of training samples" in its figures and analysis, but it does not specify explicit training, validation, or test dataset splits by percentages, counts, or by referencing predefined splits for reproducibility of the partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. While it discusses training time, it lacks hardware specifications.
Software Dependencies	No	The paper mentions using "scikit-learn Iterative Imputer" but does not specify the version number of this library or any other software dependencies needed for reproducibility.
Experiment Setup	Yes	for P-by-P imp (i.e., τ = n 1 which matches the regressor in (4)), and with τ = d/n for Thresholded P-by-P imp. For both, the technical ℓ -ball condition is not considered in numerical experiments. The curve represents the averaged excess risk over 100 repetitions within a 95% conﬁdence interval. In order for the simulations to be reproducible, here are the useful parameters to generate the dataset of Section 5.