On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Authors: Nirmit Joshi, Theodor Misiakiewicz, Nati Srebro

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate this, Section 7 considers learning sparse functions on the hypercube using online SGD on a two-layer neural network. We show that in this setting, the DLQℓ-leap exponent correctly captures learnability in the mean-field regime and linear scaling. ... Numerical Simulation. We consider a function similar to y2 in (2) with Leap CSQ = 3 but Leap SQ = 1. Specifically, we set P = 4 and C = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}}, and define y = h (z) where ... We train with online 1-batch SGD (ℓ-b SGD) on a two-layer net with different loss functions (without any regularization) and stepsize η 1/d, where we consider ambiant dimensions d {100, 300, 500}. In Figure 1, we plot the test mean squared error versus η SGD-iterations (thus also scaled with 1/d), over 10 trials.
Researcher Affiliation Academia Nirmit Joshi Toyota Technological Institute at Chicago nirmit@ttic.edu Theodor Misiakiewicz Toyota Technological Institute at Chicago theodor.misiakiewicz@ttic.edu Nathan Srebro Toyota Technological Institute at Chicago nati@ttic.edu
Pseudocode No The paper describes the mathematical formulations of query types and dynamics (e.g., DLQ definition in Section 3, SGD update rules in Section D), but it does not present any formal pseudocode or algorithm blocks.
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We attach the code in the supplemental material.
Open Datasets No We consider a function similar to y2 in (2) with Leap CSQ = 3 but Leap SQ = 1. Specifically, we set P = 4 and C = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}}, and define y = h (z) where U C ˆh (U)χU(z), where ˆh (U) Unif([ 2, 2]) for all U C. The paper describes a synthetic data generation process but does not refer to a named publicly available dataset or provide a link to the generated data.
Dataset Splits No The paper defines a 'test error' in Section 7 and mentions 'online SGD' and batch sizes, but it does not specify explicit training, validation, or test dataset splits with percentages or counts.
Hardware Specification Yes Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: All experiments are relatively small scale and can be reproduced under 1 hr of compute on a standard computer with 16 GB RAM.
Software Dependencies No The paper describes the use of neural networks and SGD, implying standard machine learning libraries, but it does not specify any software names with version numbers in the main text or the supplementary checklist.
Experiment Setup Yes We train with online 1-batch SGD (ℓ-b SGD) on a two-layer net with different loss functions (without any regularization) and stepsize η 1/d, where we consider ambiant dimensions d {100, 300, 500}. ... Choose a loss ℓ: R Y R 0 and the activation σ(x) = (1 + x)L for L 28P . We train for Θ(1) total steps in two phases and the batch size of Θ(d) with the following choice of hyperparameters. No regularization: set the regularization parameters λw = λa = λb = λc = 0. Initialization: Initialize the first layer weights and biases to deterministically 0. ... The second layer weights are sampled uniformly from [ 1, 1]... Finally, choose c0 δc= c for the given global bias choice c R. ... We deploy a two-phase training procedure. Given parameters η and κ [1/2, 3/3]d: 1. Phase 1: For all k {0, . . . , k1 1}, set ηa k = ηb k = ηc k = 0 and ηw k Rd such that ηwi k = ηκi for all i [d]. ... 2. Phase 2: Set ηa k = η and ηw k = 0, ηb k = ηc k = 0 for k {k1, . . . , k1 + k2 1}.