Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness

Authors: Thomas Pethick, Wanyun Xie, Mete Erdogan, Kimon Antonakopoulos, Antonio Silveti-Falls, Volkan Cevher

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We discuss how to instantiate the algorithms for deep learning, which we dub Clipped Scion, and demonstrate their properties on image classification and language modeling. The code is available at https://github.com/LIONS-EPFL/Clipped Scion. [...] Image classification We test on a convolutional neural network (CNN) on the CIFAR10 dataset. [...] Nano GPT We additionally test on Nano GPT Karpathy [2023] in Figure 2.
Researcher Affiliation Academia Thomas Pethick EPFL (LIONS) EMAIL Wanyun Xie EPFL (LIONS) EMAIL Mete Erdogan EPFL (LIONS) EMAIL Kimon Antonakopoulos EPFL (LIONS) EMAIL Antonio Silveti-Falls Université Paris-Saclay (CVN) EMAIL Volkan Cevher EPFL (LIONS) EMAIL
Pseudocode Yes Algorithm 1 Generalized Gradient Norm Clipping (GGNC) [...] Algorithm 2 Stochastic Short Step Conditional Gradient (S3CG) [...] In Algorithms 3 and 4 of the appendix we specialize Algorithms 1 and 2 to the particular case where X is the max-norm.
Open Source Code Yes The code is available at https://github.com/LIONS-EPFL/Clipped Scion.
Open Datasets Yes We discuss how to instantiate the algorithms for deep learning, which we dub Clipped Scion, and demonstrate their properties on image classification and language modeling. [...] Image classification We test on a convolutional neural network (CNN) on the CIFAR10 dataset. [...] We train a Dei T-base model using the Dei T codebase [Touvron et al., 2021] with replacing Layer Norm by RMS norm following [Pethick et al., 2025]. [...] Nano GPT We additionally test on Nano GPT Karpathy [2023] in Figure 2 with modernizations following [Jordan et al., 2024a]: [...] on the Fine Web dataset (see Table 4 Appendix C for further details).
Dataset Splits Yes Image classification We test on a convolutional neural network (CNN) on the CIFAR10 dataset. [...] Hyperparameters can be found in Table 2 in Appendix C. [...] Table 2: Hyperparameters for the CIFAR10 experiments [...] Dataset CIFAR10 (50000 training examples) batch size 2000 Epochs 80 [...] Dataset Image Net-1k [...] Batch size 4096 Epochs 200 [...] Dataset Fine Web batch size 512 block size 1024 Iterations n 5100
Hardware Specification Yes CIFAR10 experiments are run on a single A100 NVIDIA GPU, Nano GPT runs are run on 4 H100 NVIDIA GPUs, and Vi T experiments use 16 GH200 NVIDIA GPUs.
Software Dependencies No The paper mentions several techniques and models (e.g., Adam, RMS norm, GELU, ReLU2, Layer Norm, Newton-Schultz iteration, Dei T codebase, Modula software package) but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup Yes Table 2: Hyperparameters for the CIFAR10 experiments building on airbench [Jordan, 2024]. Hyperparameter Adam (Clipped)Scion Unconst. Scion Unconst. Clipped Scion Block size (b1, b2, b3) width factor (64, 256, 256) Activation function GELU Dataset CIFAR10 (50000 training examples) batch size 2000 Epochs 80 Stepsize schedule Linear decay γk = γ (1 k/n) Averaging parameter α 0.9 0.5 Stepsize γ 1e-3 2 8 2 5 2 2 Initial stepsize γ for decay 2e-3 2 1 2 1 Clipping parameter ρ 12800 1600 Radius r1 / rℓ/ r D 1 / 5 / 2000 1 / 5 / 200 1 / 5 / 200