Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Logistic-Normal Likelihoods for Heteroscedastic Label Noise

Authors: Erik Englesson, Amir Mehrpanah, Hossein Azizpour

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of the method by measuring its robustness against label noise in classification. We perform enlightening experiments exploring the inner workings of the method, including sensitivity to hyperparameters, ablation studies, and other insightful analyses. We empirically study the proposed loss on several datasets with synthetic and natural noise, where we show improved robustness to label noise compared to recent works; see Section 5.
Researcher Affiliation	Academia	Erik Englesson EMAIL KTH Royal Institute of Technology Amir Mehrpanah EMAIL KTH Royal Institute of Technology Hossein Azizpour EMAIL KTH Royal Institute of Technology
Pseudocode	No	The paper describes the methodology using mathematical derivations and textual explanations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at: https://github.com/Erik Englesson/Logistic-Normal
Open Datasets	Yes	We implement our method and the baselines in the same code base and compare on the following datasets: Two Moons & Circles, MNIST (Deng, 2012), CIFAR-10 & CIFAR-100 (Krizhevsky et al., 2009), CIFAR-10N & CIFAR-100N (Wei et al., 2022), and Clothing1M (Xiao et al., 2015).
Dataset Splits	Yes	We use 10% of the training set of MNIST and CIFAR as a noisy validation set. We use the provided validation and test sets. (for Clothing1M)
Hardware Specification	No	All experiments were performed using the supercomputing resource Berzelius provided by the National Supercomputer Centre at Linköping University and the Knut and Alice Wallenberg foundation. The paper mentions a specific supercomputing resource but does not provide details on specific GPU/CPU models or other hardware components.
Software Dependencies	No	We implement our method using the Tensor Flow Probability Dillon et al. (2017) library. The paper mentions the library but does not provide specific version numbers for TensorFlow Probability or TensorFlow itself.
Experiment Setup	Yes	We use a learning rate of 0.0001 for synthetic datasets, 0.001 for MNIST and Clothing1M, and 0.01 for the CIFAR datasets. We show that our method performs well under different optimizers by using gradient descent for the synthetic datasets, Adam for MNIST (batch size 256), and SGD with Nesterov momentum of 0.9 for Clothing1M (batch size 32) and the CIFAR datasets (batch size 128). We use a weight decay of 1e-3 and 5e-4 for Clothing1M and the CIFAR datasets, respectively, but no such regularization for the other datasets. We train for 2000, 100, 10, and 300 epochs for the synthetic datasets, MNIST, Clothing1M, and CIFAR, respectively.