Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Authors: Alireza Mousavi-Hosseini, Denny Wu, Murat A Erdogdu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we perform numerical simulations to verify the intuitions from Theorem 3. Specifically, we train a two-layer neural network with width m = 50 and Re LU activation, where the first layer weights are initialized uniformly on the sphere, and fix the first half of the second layer coordinates at +1/m, and the second half at 1/m. The input follows the distribution x N(0, Σ) (with an extra 1 appended for bias), where Σ = diag(σ2) with σ2 1 = 1 and σ2 i = deff 1 d 1 for input dimension d = 50. The labels are generated by a single-index model of the following form y = g( e1, x ) = e1, x 2 1. Therefore, the effective dimension from Definition 1 is exactly equal to deff. We train the neural network using the squared loss with MFLA, with a stepsize of 0.1, weight decay parameter 0.01, temperature 0.001. Figure 1b shows the test loss at the end of 200 iterations of MFLA for different numbers of training samples n and effective dimension deff. For each value of n and deff, we average the test loss over 5 independent runs with different realizations of data and initialization. In Figure 2 we measure the generalization gap, i.e. the average loss difference on the training set of n samples, and a test set of 100000 samples, at the end of 3000 iterations of training with MFLA. For this experiment, we try n = 100, n = 200, and n = 500. As seen from both figures, deff controls the generalization gap and test loss, both of which decay with larger n. |
| Researcher Affiliation | Academia | Alireza Mousavi-Hosseini1,2, Denny Wu3,4, Murat A. Erdogdu1,2 1University of Toronto, 2Vector Insitute, 3New York University, 4Flatiron Institute EMAIL,edu, EMAIL |
| Pseudocode | No | The paper describes methods using mathematical equations and descriptive text, but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | The code to reproduce the experimental results is provided at: https://github.com/mousavih/MFLD-Learnability. |
| Open Datasets | No | The paper uses synthetic data generated according to specific statistical distributions and models (e.g., 'The input follows the distribution x N(0, Σ)', 'The labels are generated by a single-index model') rather than relying on a publicly available or open dataset with concrete access information. |
| Dataset Splits | Yes | In Figure 2 we measure the generalization gap, i.e. the average loss difference on the training set of n samples, and a test set of 100000 samples, at the end of 3000 iterations of training with MFLA. For this experiment, we try n = 100, n = 200, and n = 500. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper describes the methodology and training process but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Specifically, we train a two-layer neural network with width m = 50 and Re LU activation, where the first layer weights are initialized uniformly on the sphere, and fix the first half of the second layer coordinates at +1/m, and the second half at 1/m. We train the neural network using the squared loss with MFLA, with a stepsize of 0.1, weight decay parameter 0.01, temperature 0.001. |