Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Sampling for Learning Sparse Additive Models in High Dimensions
Authors: Hemant Tyagi, Bernd Gärtner, Andreas Krause
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation results. We now provide simulation results on synthetic data to support our theoretical findings. We consider the noisy setting with the point queries being corrupted with Gaussian noise. For d = 1000, k = 4 and S = {2, 105, 424, 782}, consider f : Rd R where f = φ2(x2) + φ105(x105) + φ424(x424) + φ782(x782) with: φ2(x) = sin(πx), φ105(x) = exp( 2x), φ424(x) = (1/3) cos3(πx) + 0.8x2, φ782(x) = 0.5x4 x2 + 0.8x. We choose δ = 0.3, D = 0.2 which can be verified as valid parameters for the above φl s. Furthermore, we choose mx = 2/δ = 7 and mv = 2k log d = 56 to satisfy the conditions of Theorem 4. Next, we choose constants C = 0.2, B2 = 35 and κ = 0.95 D2 16C2k B2 = 4.24 10 4 as required by Theorem 4. For the choice 2Ck B2 = 0.0267, we then query f at (2mx + 1)(mv + 1) = 855 points. The function values are corrupted with Gaussian noise: N(0, σ2/N) for σ = 0.01 and N = 100. This is equivalent to resampling and averaging the points queries N times. Importantly the sufficient condition on N, as stated in Theorem 4 is σ2 κp ) = 6974 for p = 0.1. Thus we consider a significantly undersampled regime. Lastly we select the threshold τ = mv 2κ = 0.2875 as stated by Theorem 4, and employ Algorithm 1 for different values of the smoothing parameter γ. Over 10 independent runs of the algorithm we observed that S was recovered exactly each time. |
| Researcher Affiliation | Academia | Hemant Tyagi ETH Z urich EMAIL Andreas Krause ETH Z urich EMAIL Bernd G artner ETH Z urich EMAIL |
| Pseudocode | Yes | Algorithm 1 Algorithm for learning φl in the SPAM: f(x) = P 1: Choose mx, mv and construct sampling sets X and V as in (3.2), (3.3). 2: Choose step size ϵ > 0. Query f at f(ξi),f(ξi+ϵvj) for i = mx, . . . , mx and j = 1, . . . , mv. 3: Construct yi where yi,j = f(ξi+ϵvj) f(ξi) ϵ for i = mx, . . . , mx and j = 1, . . . , mv. 4: Set bxi := argmin yi=Vz z 1. For τ > 0 compute b S = mx i= mx {l {1, . . . , d} : |bxi,l| > τ}. 5: For each l b S, run (P) as defined in Section 4 using (bxi,l)mx i= mx, τ and some smoothing parameter γ 0, to obtain φ l. 6: For each l b S, set φest,l to be the piece-wise integral of φ l as explained in Section 4. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or include a link to a code repository. |
| Open Datasets | No | The paper uses synthetic data generated according to specified functions and noise models, but does not provide access information for a publicly available or open dataset. The data is generated programmatically within the described setup. |
| Dataset Splits | No | The paper uses synthetic data and focuses on recovery accuracy, but does not explicitly mention training, validation, or test dataset splits in the context of machine learning model training. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names). |
| Experiment Setup | Yes | For d = 1000, k = 4 and S = {2, 105, 424, 782}, consider f : Rd R where f = φ2(x2) + φ105(x105) + φ424(x424) + φ782(x782) with: φ2(x) = sin(πx), φ105(x) = exp( 2x), φ424(x) = (1/3) cos3(πx) + 0.8x2, φ782(x) = 0.5x4 x2 + 0.8x. We choose δ = 0.3, D = 0.2 which can be verified as valid parameters for the above φl s. Furthermore, we choose mx = 2/δ = 7 and mv = 2k log d = 56 to satisfy the conditions of Theorem 4. Next, we choose constants C = 0.2, B2 = 35 and κ = 0.95 D2 16C2k B2 = 4.24 10 4 as required by Theorem 4. For the choice 2Ck B2 = 0.0267, we then query f at (2mx + 1)(mv + 1) = 855 points. The function values are corrupted with Gaussian noise: N(0, σ2/N) for σ = 0.01 and N = 100. This is equivalent to resampling and averaging the points queries N times. Importantly the sufficient condition on N, as stated in Theorem 4 is σ2 κp ) = 6974 for p = 0.1. Thus we consider a significantly undersampled regime. Lastly we select the threshold τ = mv 2κ = 0.2875 as stated by Theorem 4, and employ Algorithm 1 for different values of the smoothing parameter γ. |