Classification of Heavy-tailed Features in High Dimensions: a Superstatistical Approach
Authors: Urte Adomaityte, Gabriele Sicuro, Pierpaolo Vivo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Fig. 1 we present the results of our numerical experiments using the square loss and small regularisation. An excellent agreement between the theoretical predictions and the results of numerical experiments is found for a range of values of 0 and sample complexity U, both for balanced, i.e., equally sized, and unbalanced clusters of data (the plot for this case can be found in Appendix A.3). |
| Researcher Affiliation | Academia | Urte Adomaityte Department of Mathematics King s College London urte.adomaityte@kcl.ac.uk Gabriele Sicuro Department of Mathematics King s College London gabriele.sicuro@kcl.ac.uk Pierpaolo Vivo Department of Mathematics King s College London pierpaolo.vivo@kcl.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper describes generating synthetic datasets according to specified distributions (e.g., "The synthetic data sets will be produced using, an inverse-Gamma-distributed variance Δ...") rather than using a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper discusses sample sizes and dimensionality in the context of high-dimensional limits (e.g., "sample size = and the dimensionality 3 are both sent to infinity, with =/3 U kept constant.") and mentions training and test errors, but does not provide specific dataset split information (percentages, sample counts) for reproducibility of finite-sample experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like "scikit-learn" [58] and "SciPy" [73] in its references, but does not specify the version numbers of these or any other ancillary software components used for the experiments. |
| Experiment Setup | Yes | We compare our theoretical predictions with the results of numerical experiments for a large family of data distributions. The results have been obtained using ridge 2-regularisation, and both quadratic and logistic losses, with various data cluster balances d. We will also assume, without loss of generality, that = 1 p 3 -, where N(0, O3). The synthetic data sets will be produced using, an inverse-Gamma-distributed variance Δ, with density parametrised as r(Δ) r0,2(Δ) = 20 Γ(0)Δ0+1 e 2 depending on the shape parameter 0 > 0 and on the scale parameter 2 > 0. ... square loss and small regularisation. An excellent agreement... with _ = 10 5. (Fig. 1 caption). |