Efficient Sampling for Learning Sparse Additive Models in High Dimensions

Authors: Hemant Tyagi, Bernd Gärtner, Andreas Krause

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation results. We now provide simulation results on synthetic data to support our theoretical findings. We consider the noisy setting with the point queries being corrupted with Gaussian noise. For d = 1000, k = 4 and S = {2, 105, 424, 782}, consider f : Rd R where f = φ2(x2) + φ105(x105) + φ424(x424) + φ782(x782) with: φ2(x) = sin(πx), φ105(x) = exp( 2x), φ424(x) = (1/3) cos3(πx) + 0.8x2, φ782(x) = 0.5x4 x2 + 0.8x. We choose δ = 0.3, D = 0.2 which can be verified as valid parameters for the above φl s. Furthermore, we choose mx = 2/δ = 7 and mv = 2k log d = 56 to satisfy the conditions of Theorem 4. Next, we choose constants C = 0.2, B2 = 35 and κ = 0.95 D2 16C2k B2 = 4.24 10 4 as required by Theorem 4. For the choice 2Ck B2 = 0.0267, we then query f at (2mx + 1)(mv + 1) = 855 points. The function values are corrupted with Gaussian noise: N(0, σ2/N) for σ = 0.01 and N = 100. This is equivalent to resampling and averaging the points queries N times. Importantly the sufficient condition on N, as stated in Theorem 4 is σ2 κp ) = 6974 for p = 0.1. Thus we consider a significantly undersampled regime. Lastly we select the threshold τ = mv 2κ = 0.2875 as stated by Theorem 4, and employ Algorithm 1 for different values of the smoothing parameter γ. Over 10 independent runs of the algorithm we observed that S was recovered exactly each time.
Researcher Affiliation Academia Hemant Tyagi ETH Z urich htyagi@inf.ethz.ch Andreas Krause ETH Z urich krausea@ethz.ch Bernd G artner ETH Z urich gaertner@inf.ethz.ch
Pseudocode Yes Algorithm 1 Algorithm for learning φl in the SPAM: f(x) = P 1: Choose mx, mv and construct sampling sets X and V as in (3.2), (3.3). 2: Choose step size ϵ > 0. Query f at f(ξi),f(ξi+ϵvj) for i = mx, . . . , mx and j = 1, . . . , mv. 3: Construct yi where yi,j = f(ξi+ϵvj) f(ξi) ϵ for i = mx, . . . , mx and j = 1, . . . , mv. 4: Set bxi := argmin yi=Vz z 1. For τ > 0 compute b S = mx i= mx {l {1, . . . , d} : |bxi,l| > τ}. 5: For each l b S, run (P) as defined in Section 4 using (bxi,l)mx i= mx, τ and some smoothing parameter γ 0, to obtain φ l. 6: For each l b S, set φest,l to be the piece-wise integral of φ l as explained in Section 4.
Open Source Code No The paper does not provide any explicit statement about releasing source code or include a link to a code repository.
Open Datasets No The paper uses synthetic data generated according to specified functions and noise models, but does not provide access information for a publicly available or open dataset. The data is generated programmatically within the described setup.
Dataset Splits No The paper uses synthetic data and focuses on recovery accuracy, but does not explicitly mention training, validation, or test dataset splits in the context of machine learning model training.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names).
Experiment Setup Yes For d = 1000, k = 4 and S = {2, 105, 424, 782}, consider f : Rd R where f = φ2(x2) + φ105(x105) + φ424(x424) + φ782(x782) with: φ2(x) = sin(πx), φ105(x) = exp( 2x), φ424(x) = (1/3) cos3(πx) + 0.8x2, φ782(x) = 0.5x4 x2 + 0.8x. We choose δ = 0.3, D = 0.2 which can be verified as valid parameters for the above φl s. Furthermore, we choose mx = 2/δ = 7 and mv = 2k log d = 56 to satisfy the conditions of Theorem 4. Next, we choose constants C = 0.2, B2 = 35 and κ = 0.95 D2 16C2k B2 = 4.24 10 4 as required by Theorem 4. For the choice 2Ck B2 = 0.0267, we then query f at (2mx + 1)(mv + 1) = 855 points. The function values are corrupted with Gaussian noise: N(0, σ2/N) for σ = 0.01 and N = 100. This is equivalent to resampling and averaging the points queries N times. Importantly the sufficient condition on N, as stated in Theorem 4 is σ2 κp ) = 6974 for p = 0.1. Thus we consider a significantly undersampled regime. Lastly we select the threshold τ = mv 2κ = 0.2875 as stated by Theorem 4, and employ Algorithm 1 for different values of the smoothing parameter γ.