Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Super Consistency of Neural Network Landscapes and Learning Rate Transfer
Authors: Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate our claims with a substantial suite of experiments, covering a wide range of datasets and architectures: from Res Nets and Vision Transformers trained on benchmark vision datasets to Transformers-based language models trained on Wiki Text. |
| Researcher Affiliation | Academia | Lorenzo Noci 1 Alexandru Meterez 3 4 5 Thomas Hofmann 1 Antonio Orvieto 2 3 4 1ETH Zürich, 2ELLIS Tübingen, 3MPI for Intelligent Systems, 4Tübingen AI Center, 5Harvard University |
| Pseudocode | No | The paper does not contain clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We will release the code upone acceptance. |
| Open Datasets | Yes | We corroborate our claims with a substantial suite of experiments, covering a wide range of datasets and architectures: from Res Nets and Vision Transformers trained on benchmark vision datasets to Transformers-based language models trained on Wiki Text. (...) we train a residual network on CIFAR-10 (a 10 classes image classification task) using cross-entropy loss |
| Dataset Splits | No | The paper mentions training on various datasets and uses terms like 'batch size' and 'epochs', implying data splits. However, it does not explicitly state the specific train/validation/test percentages or sample counts for the standard datasets used (e.g., CIFAR-10, ImageNet), nor does it provide external links to these splits. While a specific stratified subset of CIFAR-10 is mentioned for one experiment, its full train/val/test split details are not provided. |
| Hardware Specification | Yes | The experiments were ran on A100 and H100 GPUs, with 80GB VRAM. |
| Software Dependencies | No | Our implementation is based on the implementation provided by Yang et al. [6], with the addition of the residual scaling. This uses a different parametrization from the one reported in Table. 1 but equivalent dynamics, obtainable using their abc-rule". "The implementations of our models are done in Py Torch." No specific version numbers for PyTorch or other libraries are provided. |
| Experiment Setup | Yes | Figure 1: Other parameters: B = 128, epochs = 50. (...) Model: 3-layer Conv Net, τ = 0, η0 = 0.7 (optimal). Details in Sec. J. (...) Parameters: batch size= 128, epochs = 20 for the µP/NTP models and 10 for the random feature model, dataset: CIFAR-10, without data augmentation. (Figure 26 caption) (...) HPs: 2 layers, 2 heads, 20 epochs, batch size 512, 100 warmup steps, sequence length 35. (Figure 16 caption) |