reproducibilityindex.ai

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Authors: Yihang Chen, Fanghui Liu, Yiping Lu, Grigorios Chrysos, Volkan Cevher

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also validate our theoretical results by some numerical experiments in Appendix C.6.
Researcher Affiliation	Academia	Yihang Chen EPFL yihang.chen@epfl.ch Fanghui Liu University of Warwick fanghui.liu@warwick.ac.uk Yiping Lu New York University yplu@nyu.edu Grigorios G. Chrysos University of Wisconsin-Madison chrysos@wisc.edu Volkan Cevher EPFL volkan.cevher@epfl.ch
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We validate our findings on the toy dataset Two Spirals , where the data dimension d = 2.
Dataset Splits	No	The paper mentions "full-batch training for 1,000 steps on the training dataset of size ntrain, and test the resulting model on the test dataset of size ntest = 1024" but does not specify a separate validation split.
Hardware Specification	No	No specific hardware specifications (e.g., GPU/CPU models, memory details) used for running experiments are provided in the paper.
Software Dependencies	No	The paper mentions "a neural ODE model (Poli et al., 2021)" and "Adam optimizer", and "tanh activation function" but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, or specific library versions).
Experiment Setup	Yes	We use a neural ODE model (Poli et al., 2021) to approximate the infinite depth Res Nets, where we take the discretization L = 10. The neural ODE model and the output layer are both parametrized by a two-layer network with the tanh activation function, and the hidden dimension is M = K = 20. The parameters of the Res Net encoder and the output layer are jointly trained by Adam optimizer with an initial learning rate 0.01. We perform full-batch training for 1,000 steps on the training dataset of size ntrain, and test the resulting model on the test dataset of size ntest = 1024 by the 0-1 classification loss. We run experiments over 3 seeds and report the mean.