Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Loss Functions and Operators Generated by f-Divergences

Authors: Vincent Roulet, Tianlin Liu, Nino Vieillard, Michael Eli Sander, Mathieu Blondel

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate different f-divergence generated losses, we apply them to tasks of different data modalities, including image classification (Section 4.1) and text generation (Section 4.2). These experiments also cover different training strategies: from scratch, finetuning, and distillation.
Researcher Affiliation	Industry	1Google Deep Mind. Correspondence to: Mathieu Blondel, Vincent Roulet <EMAIL, EMAIL>.
Pseudocode	Yes	Algorithm 1 Computing f-softmax and f-softargmax
Open Source Code	No	The paper mentions a GitHub link http://github.com/google-deepmind/nanodo in the context of describing the pretraining method for Nano DO models, which are third-party tools used in their experiments. It does not explicitly state that the code for the f-divergence generated losses described in this paper is made publicly available by the authors.
Open Datasets	Yes	We apply different f-divergence generated losses to train a ResNet50 model (He et al., 2016) on the ImageNet-2012 dataset (Russakovsky et al., 2015). We used the same pretraining method as the Nano DO models (Liu et al., 2024; Wortsman et al., 2024), a set of well-tuned decoder-only transformer models trained on the public C4 dataset (Raffel et al., 2020). ...on a text summarization task (Narayan et al., 2018). For SFT, we use a pretrained T5-base model (Raffel et al., 2020) ... on the XSum dataset (Narayan et al., 2018)
Dataset Splits	Yes	The ImageNet dataset contains 1.28 million training images and 50,000 validation images, belonging to one of 1,000 classes.
Hardware Specification	Yes	This experiment is run on TPU.
Software Dependencies	No	The paper mentions JAX in relation to the Nano DO model implementation but does not specify its version or the versions of other critical software libraries used for their own methodologies.
Experiment Setup	Yes	We use an SGD optimizer with 0.9 momentum to train the ResNet50 model for 90 epochs. During the initial 5 epochs, we use a linear warmup that achieves a peak learning rate of 0.2; we then use cosine annealing to reduce the learning rate to 0. The weight decay is set to be 10 4. The batch size is 512.