Classification of Heavy-tailed Features in High Dimensions: a Superstatistical Approach

Authors: Urte Adomaityte, Gabriele Sicuro, Pierpaolo Vivo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Fig. 1 we present the results of our numerical experiments using the square loss and small regularisation. An excellent agreement between the theoretical predictions and the results of numerical experiments is found for a range of values of 0 and sample complexity U, both for balanced, i.e., equally sized, and unbalanced clusters of data (the plot for this case can be found in Appendix A.3).
Researcher Affiliation Academia Urte Adomaityte Department of Mathematics King s College London urte.adomaityte@kcl.ac.uk Gabriele Sicuro Department of Mathematics King s College London gabriele.sicuro@kcl.ac.uk Pierpaolo Vivo Department of Mathematics King s College London pierpaolo.vivo@kcl.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper describes generating synthetic datasets according to specified distributions (e.g., "The synthetic data sets will be produced using, an inverse-Gamma-distributed variance Δ...") rather than using a publicly available dataset with concrete access information.
Dataset Splits No The paper discusses sample sizes and dimensionality in the context of high-dimensional limits (e.g., "sample size = and the dimensionality 3 are both sent to infinity, with =/3 U kept constant.") and mentions training and test errors, but does not provide specific dataset split information (percentages, sample counts) for reproducibility of finite-sample experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software like "scikit-learn" [58] and "SciPy" [73] in its references, but does not specify the version numbers of these or any other ancillary software components used for the experiments.
Experiment Setup Yes We compare our theoretical predictions with the results of numerical experiments for a large family of data distributions. The results have been obtained using ridge 2-regularisation, and both quadratic and logistic losses, with various data cluster balances d. We will also assume, without loss of generality, that = 1 p 3 -, where N(0, O3). The synthetic data sets will be produced using, an inverse-Gamma-distributed variance Δ, with density parametrised as r(Δ) r0,2(Δ) = 20 Γ(0)Δ0+1 e 2 depending on the shape parameter 0 > 0 and on the scale parameter 2 > 0. ... square loss and small regularisation. An excellent agreement... with _ = 10 5. (Fig. 1 caption).