Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation
Authors: Sylvain Arlot, Matthieu Lerasle
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that these variances depend on V like 1 + 4/(V 1), at least in some particular cases, suggesting that the performance increases much from V = 2 to V = 5 or 10, and then is almost constant. Overall, this can explain the common advice to take V = 5 at least in our setting and when the computational power is limited , as supported by some simulation experiments. (...) This section illustrates the main theoretical results of the paper with some experiments on synthetic data. |
| Researcher Affiliation | Academia | Sylvain Arlot EMAIL Laboratoire de Math ematiques d Orsay Univ. Paris-Sud, CNRS, Universit e Paris-Saclay 91405 Orsay, France Matthieu Lerasle EMAIL CNRS Univ. Nice Sophia Antipolis LJAD CNRS UMR 7351 06100 Nice France |
| Pseudocode | Yes | Algorithm 1 Input: B some partition of {1, . . . , n} satisfying (Reg), ξ1, . . . , ξn X and (ψλ)λ Λm a finite orthonormal family of L2(µ). 1. For i {1, . . . , V } and λ Λm, compute Ai,λ := V n P j Bi ψλ(ξj). 2. For i, j {1, . . . , V }, compute Ci,j := P λ Λm Ai,λAj,λ. 3. Compute S := P 1 i,j V Ci,j and T := tr(C). Empirical risk: Pnγ(bsm) = S V -fold cross-validation criterion: crit VFCV(m) = T V (V 1) S T (V 1)2 ; V -fold penalty: pen VF(m) = crit VFCV(m) Pnγ(bsm) V 1/2 |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | In this section, we take X = [0, 1] and µ is the Lebesgue measure on X. Two examples are considered for the target density s and for the collection of models (Sm)m Mn. Two density functions s are considered, see Figure 1: Setting L: s(x) = 10x 3 10 x<1/3 + (1 + x 3)11 x 1/3. Setting S: s is the mixture of the piecewise linear density x 7 (8x 4)11 x 1/2 (with weight 0.8) and four truncated Gaussian densities with means (k/10)k=1,...,4 and standard deviation 1/60 (each with weight 0.05). |
| Dataset Splits | Yes | Let V {2, . . . , n} be a positive integer and let B = BJV K = (B1, . . . , BV ) be some partition of Jn K. The V -fold cross-validation criterion is defined by crit VFCV(m, B) := 1 K=1 crit HO(m, Bc K). |
| Hardware Specification | No | The paper does not provide specific hardware details for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | Since it is often suggested to multiply the usual penalties by some factor larger than one (Arlot, 2008), we consider all penalties above multiplied by a factor C [0, 10]. Complete results can be found in Section G of the Online Appendix. |