Generalization of Scaled Deep ResNets in the Mean-Field Regime
Authors: Yihang Chen, Fanghui Liu, Yiping Lu, Grigorios Chrysos, Volkan Cevher
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also validate our theoretical results by some numerical experiments in Appendix C.6. |
| Researcher Affiliation | Academia | Yihang Chen EPFL yihang.chen@epfl.ch Fanghui Liu University of Warwick fanghui.liu@warwick.ac.uk Yiping Lu New York University yplu@nyu.edu Grigorios G. Chrysos University of Wisconsin-Madison chrysos@wisc.edu Volkan Cevher EPFL volkan.cevher@epfl.ch |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We validate our findings on the toy dataset Two Spirals , where the data dimension d = 2. |
| Dataset Splits | No | The paper mentions "full-batch training for 1,000 steps on the training dataset of size ntrain, and test the resulting model on the test dataset of size ntest = 1024" but does not specify a separate validation split. |
| Hardware Specification | No | No specific hardware specifications (e.g., GPU/CPU models, memory details) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions "a neural ODE model (Poli et al., 2021)" and "Adam optimizer", and "tanh activation function" but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, or specific library versions). |
| Experiment Setup | Yes | We use a neural ODE model (Poli et al., 2021) to approximate the infinite depth Res Nets, where we take the discretization L = 10. The neural ODE model and the output layer are both parametrized by a two-layer network with the tanh activation function, and the hidden dimension is M = K = 20. The parameters of the Res Net encoder and the output layer are jointly trained by Adam optimizer with an initial learning rate 0.01. We perform full-batch training for 1,000 steps on the training dataset of size ntrain, and test the resulting model on the test dataset of size ntest = 1024 by the 0-1 classification loss. We run experiments over 3 seeds and report the mean. |