Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Classification of Heavy-tailed Features in High Dimensions: a Superstatistical Approach
Authors: Urte Adomaityte, Gabriele Sicuro, Pierpaolo Vivo
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Fig. 1 we present the results of our numerical experiments using the square loss and small regularisation. An excellent agreement between the theoretical predictions and the results of numerical experiments is found for a range of values of 0 and sample complexity U, both for balanced, i.e., equally sized, and unbalanced clusters of data (the plot for this case can be found in Appendix A.3). |
| Researcher Affiliation | Academia | Urte Adomaityte Department of Mathematics King s College London EMAIL Gabriele Sicuro Department of Mathematics King s College London EMAIL Pierpaolo Vivo Department of Mathematics King s College London EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper describes generating synthetic datasets according to specified distributions (e.g., "The synthetic data sets will be produced using, an inverse-Gamma-distributed variance ฮ...") rather than using a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper discusses sample sizes and dimensionality in the context of high-dimensional limits (e.g., "sample size = and the dimensionality 3 are both sent to in๏ฌnity, with =/3 U kept constant.") and mentions training and test errors, but does not provide specific dataset split information (percentages, sample counts) for reproducibility of finite-sample experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like "scikit-learn" [58] and "SciPy" [73] in its references, but does not specify the version numbers of these or any other ancillary software components used for the experiments. |
| Experiment Setup | Yes | We compare our theoretical predictions with the results of numerical experiments for a large family of data distributions. The results have been obtained using ridge 2-regularisation, and both quadratic and logistic losses, with various data cluster balances d. We will also assume, without loss of generality, that = 1 p 3 -, where N(0, O3). The synthetic data sets will be produced using, an inverse-Gamma-distributed variance ฮ, with density parametrised as r(ฮ) r0,2(ฮ) = 20 ฮ(0)ฮ0+1 e 2 depending on the shape parameter 0 > 0 and on the scale parameter 2 > 0. ... square loss and small regularisation. An excellent agreement... with _ = 10 5. (Fig. 1 caption). |