Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics
Authors: Indrashis Das, Mahmoud Safari, Steven Adriaensen, Frank Hutter
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across diverse tasks, including Image Classification, Language Modeling, Semantic Segmentation, Object Detection, Instance Segmentation, and Diffusion, highlight Go LU s superior performance relative to state-of-the-art activation functions, establishing Go LU as a robust alternative to existing activation functions. |
| Researcher Affiliation | Collaboration | Indrashis Das University of Freiburg EMAIL Mahmoud Safari University of Freiburg EMAIL Steven Adriaensen University of Freiburg EMAIL Frank Hutter Prior Labs & ELLIS Institute Tรผbingen & University of Freiburg EMAIL |
| Pseudocode | No | The paper describes the methodology and properties of Go LU in sections 2.1 and 2.2, using equations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | To facilitate reproducibility, we have made our code available at https://github.com/automl/Go LU. |
| Open Datasets | Yes | Extensive experiments across diverse tasks, including Image Classification, Language Modeling, Semantic Segmentation, Object Detection, Instance Segmentation, and Diffusion, highlight Go LU s superior performance relative to state-of-the-art activation functions, establishing Go LU as a robust alternative to existing activation functions. We begin with image classification, training Res Net-18, 34, 50 [He et al., 2016], Wide Res Net-50-2 [Zagoruyko and Komodakis, 2016], Dense Net-121 [Huang et al., 2017], Efficient Net-B0 [Tan and Le, 2019], Tiny Vi T Wu et al. [2022], Vi T-B/32 and Vi T-B/16 [Dosovitskiy et al., 2020] on Image Net-1k Deng et al. [2009]. We then extend our experiments to language modeling. We train baby GPT on the Tiny Stories (TS) [Eldan and Li, 2023] dataset and GPT2-S [Radford et al., 2019] on the Open Web Text (OWT) [Gokaslan et al., 2019] dataset, leveraging the nano GPT repository Karpathy [2023]. Additionally, we assess Go LU s performance on Semantic Segmentation (Deep Lab V3 Chen et al. [2017]), Object Detection (Faster R-CNN-FPN Ren et al. [2015], Retina Net-FPN Lin [2017]), and Instance Segmentation (Mask R-CNN-FPN He et al. [2017]) on MS-COCO Lin et al. [2014], leveraging our pre-trained Res Net-50 backbone on Image Net-1k. Further, we test Go LU on Denoising Diffusion Probabilistic Models Ho et al. [2020] on the Celeb A Liu et al. [2015] dataset. |
| Dataset Splits | Yes | The Tiny Stories dataset consists of 2,119,719 data points in the training set and 21,990 in the test set, while the Open Web Text dataset has 8,009,762 data points in the training set and 4,007 data points in the test set. Both baby GPT and nano GPT have a vocabulary size of 50,304 and a maximum sequence length of 1024. The MS-COCO dataset with PASCAL-VOC labels contains 92,518 data points in the training set and 5,000 data points in the test set. Unlike Semantic Segmentation, the MS-COCO dataset for object detection contains 117,266 images in the training set and 5,000 images in the test set. The Celeb A dataset, comprises of 162,770 training images and 19,867 test images of human faces. |
| Hardware Specification | Yes | All experiments in this section were conducted on NVIDIA A100 GPUs, with an approximate total compute time of 112K GPU hours, except for Tiny Vi T, which was executed on an NVIDIA H100 GPU with a total runtime of 455 GPU hours. All runs were executed on a single NVIDIA L40S GPU with a total runtime of roughly 1750 GPU hours. |
| Software Dependencies | No | The paper mentions software components like Py Torch, CUDA, Torchvision, timm library, nano GPT repository, and Fairseq framework. However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | Res Nets 18, 34, 50, Wide Res Net-50-2 and Dense Net-121 are trained for 90 epochs with a batch size of 256, SGD with momentum=0.9 (Nesterov for WRN-50-2 and DN-121), learning rate 0.1, and weight decay 1e-4. Further, a Step learning rate scheduler is applied that reduces the learning rate by a gamma = 0.1 after every 30 epochs. Efficient Net-B0 is trained using the timm library for 450 epochs with a batch size of 1536 using RMSProp Hinton et al. [2012] with an initial learning rate of 0.048 and a weight decay of 1e-5. Vi T models are trained for 300 epochs with a batch size of 4096 using Adam W Loshchilov and Hutter [2017] with an initial learning rate of 3e-3 and weight decay of 0.3. |