Symmetric Single Index Learning

Authors: Aaron Zweig, Joan Bruna

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS To study an experimental setup for our setting, we consider the student-teacher setup outlined above with gradient descent. We consider N = 25, M = 100, and approximate the matrix A by capping the infinite number of rows at 150, which was sufficient for 1 PAh 2 0.001 in numerical experiments. For the link function f, we choose its only non-zero monomial coefficients to be α3 = α4 = α5 = 1 3. And correspondingly, g simply has α3 = 1 and all other coefficients at zero. [...] Under this setup, we train full gradient descent on 50000 samples from the Vandermonde V distribution under 20000 iterations. The only parameter to be tuned is the learning rate, and we observe over the small grid of [0.001, 0.0025, 0.005] that a learning rate of 0.0025 performs best for the both models in terms of probability of r reaching approximately 1, i.e. strong recovery. Figure 1: The learning trajectory, over ten independent runs...
Researcher Affiliation Academia Aaron Zweig Courant Institute of Mathematical Sciences New York University New York, NY 10012, USA az831@nyu.edu Joan Bruna Courant Institute of Mathematical Sciences Center for Data Science New York University New York, NY 10012, USA bruna@cims.nyu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the source code of the methodology described.
Open Datasets No The paper mentions training on '50000 samples from the Vandermonde V distribution' and refers to 'the squared Vandermonde density over N copies of the complex unit circle (Macdonald, 1998)'. However, it does not provide a direct link, DOI, specific repository name, or a formal citation with author names and year in brackets/parentheses for public access to this data.
Dataset Splits No The paper mentions training on '50000 samples' but does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or test sets.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Under this setup, we train full gradient descent on 50000 samples from the Vandermonde V distribution under 20000 iterations. The only parameter to be tuned is the learning rate, and we observe over the small grid of [0.001, 0.0025, 0.005] that a learning rate of 0.0025 performs best for the both models in terms of probability of r reaching approximately 1, i.e. strong recovery.