Near-optimal rate of consistency for linear models with missing values
Authors: Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments highlight the benefits of our method compared to state-of-the-art algorithms used for predictions with missing values. In this section, we numerically evaluate the performance of several regressors on varying missing data scenarios. |
| Researcher Affiliation | Academia | 1Sorbonne Universit e, CNRS, Laboratoire de Probabilit es, Statistique et Mod elisation (LPSM), F-75005 Paris, France 2MOKAPLAN, INRIA Paris 3CMAP, UMR7641, Ecole Polytechnique, IP Paris, 91128 Palaiseau, France. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode blocks or algorithms in a structured format. |
| Open Source Code | Yes | The codes of our numerical experiments are all available on a github.com/Alexis Ayme/minimax_linear_na. |
| Open Datasets | Yes | We consider three different settings in dimension d = 8 with increasing difficulty: (a) MCAR Bernoulli... (b) MAR... (c) MNAR-GPMM... In order for the simulations to be reproducible, here are the useful parameters to generate the dataset of Section 5. ... We ran experiments on the real superconductivity dataset (d = 81)... |
| Dataset Splits | No | The paper mentions "Number of training samples" in its figures and analysis, but it does not specify explicit training, validation, or test dataset splits by percentages, counts, or by referencing predefined splits for reproducibility of the partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. While it discusses training time, it lacks hardware specifications. |
| Software Dependencies | No | The paper mentions using "scikit-learn Iterative Imputer" but does not specify the version number of this library or any other software dependencies needed for reproducibility. |
| Experiment Setup | Yes | for P-by-P imp (i.e., τ = n 1 which matches the regressor in (4)), and with τ = d/n for Thresholded P-by-P imp. For both, the technical ℓ -ball condition is not considered in numerical experiments. The curve represents the averaged excess risk over 100 repetitions within a 95% confidence interval. In order for the simulations to be reproducible, here are the useful parameters to generate the dataset of Section 5. |