Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Implicit regularization of deep residual networks towards neural ODEs
Authors: Pierre Marion, Yu-Han Wu, Michael Eli Sander, Gérard Biau
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 NUMERICAL EXPERIMENTS We now present numerical experiments to validate our theoretical findings, using both synthetic and real-world data. Our code is available on Git Hub (see Appendix E for details and additional plot). 5.1 SYNTHETIC DATA We consider the residual network (3) with the initialization scheme of Section 3. The activation function is GELU (Hendrycks & Gimpel, 2016), which is a smooth approximation of Re LU: x 7 max(x, 0). The sample points (xi, yi)1 i n follow independent standard Gaussian distributions. The mean-squared error is minimized using full-batch gradient descent. The following experiments exemplify the large-depth (t [0, T], L ) and long-time (t , L finite) limits. ... 5.2 REAL-WORLD DATA We now investigate the properties of deep residual networks on the CIFAR 10 dataset (Krizhevsky, 2009). ... Table 1 reports the accuracy of the trained network, and whether it has Lipschitz continuous (or smooth) weights after training, depending on the activation function σ and on the initialization scheme. |
| Researcher Affiliation | Academia | Pierre Marion , Yu-Han Wu LPSM Sorbonne Université, CNRS Paris, France Michael E. Sander DMA ENS, CNRS Paris, France Gérard Biau LPSM Sorbonne Université, CNRS Paris, France |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available on Git Hub (see Appendix E for details and additional plot). |
| Open Datasets | Yes | 5.2 REAL-WORLD DATA We now investigate the properties of deep residual networks on the CIFAR 10 dataset (Krizhevsky, 2009). |
| Dataset Splits | No | The paper mentions using the CIFAR 10 dataset and training for a certain number of iterations/epochs, but it does not provide specific details on how the dataset was split into training, validation, or test sets (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | This work was granted access to the HPC resources of IDRIS under the allocation 2020-[AD011012073] made by GENCI. This mentions an HPC resource but does not provide specific hardware details (e.g., GPU/CPU models, memory sizes). |
| Software Dependencies | No | We use Pytorch (Paszke et al., 2019). This mentions a software dependency but does not specify its version number or any other software dependencies with versions. |
| Experiment Setup | Yes | 5.1 SYNTHETIC DATA ... Large-depth limit. We take n = 100, d = 16, m = 32. We train for 500 iterations, and set the learning rate to L 10 2. ... Long-time limit. We take n = 50, d = 16, m = 64, L = 64, and train for 80,000 iterations with a learning rate of 5L 10 3. 5.2 REAL-WORLD DATA ... The model is trained using stochastic gradient descent on the cross-entropy loss for 180 epochs. The initial learning rate is 4 10 2 and is gradually decreased using a cosine learning rate scheduler. |