Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

Authors: Sylvain Arlot, Matthieu Lerasle

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that these variances depend on V like 1 + 4/(V 1), at least in some particular cases, suggesting that the performance increases much from V = 2 to V = 5 or 10, and then is almost constant. Overall, this can explain the common advice to take V = 5 at least in our setting and when the computational power is limited , as supported by some simulation experiments. (...) This section illustrates the main theoretical results of the paper with some experiments on synthetic data.
Researcher Affiliation Academia Sylvain Arlot EMAIL Laboratoire de Math ematiques d Orsay Univ. Paris-Sud, CNRS, Universit e Paris-Saclay 91405 Orsay, France Matthieu Lerasle EMAIL CNRS Univ. Nice Sophia Antipolis LJAD CNRS UMR 7351 06100 Nice France
Pseudocode Yes Algorithm 1 Input: B some partition of {1, . . . , n} satisfying (Reg), ξ1, . . . , ξn X and (ψλ)λ Λm a finite orthonormal family of L2(µ). 1. For i {1, . . . , V } and λ Λm, compute Ai,λ := V n P j Bi ψλ(ξj). 2. For i, j {1, . . . , V }, compute Ci,j := P λ Λm Ai,λAj,λ. 3. Compute S := P 1 i,j V Ci,j and T := tr(C). Empirical risk: Pnγ(bsm) = S V -fold cross-validation criterion: crit VFCV(m) = T V (V 1) S T (V 1)2 ; V -fold penalty: pen VF(m) = crit VFCV(m) Pnγ(bsm) V 1/2
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No In this section, we take X = [0, 1] and µ is the Lebesgue measure on X. Two examples are considered for the target density s and for the collection of models (Sm)m Mn. Two density functions s are considered, see Figure 1: Setting L: s(x) = 10x 3 10 x<1/3 + (1 + x 3)11 x 1/3. Setting S: s is the mixture of the piecewise linear density x 7 (8x 4)11 x 1/2 (with weight 0.8) and four truncated Gaussian densities with means (k/10)k=1,...,4 and standard deviation 1/60 (each with weight 0.05).
Dataset Splits Yes Let V {2, . . . , n} be a positive integer and let B = BJV K = (B1, . . . , BV ) be some partition of Jn K. The V -fold cross-validation criterion is defined by crit VFCV(m, B) := 1 K=1 crit HO(m, Bc K).
Hardware Specification No The paper does not provide specific hardware details for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes Since it is often suggested to multiply the usual penalties by some factor larger than one (Arlot, 2008), we consider all penalties above multiplied by a factor C [0, 10]. Complete results can be found in Section G of the Online Appendix.