Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unifying Self-Supervised Clustering and Energy-Based Models

Authors: Emanuele Sansone, Robin Manhaeve

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical findings are substantiated through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, demonstrating that our objective function allows to jointly train a backbone network in a discriminative and generative fashion, consequently outperforming existing self-supervised learning strategies in terms of clustering, generation and out-of-distribution detection performance by a wide margin.
Researcher Affiliation	Academia	Emanuele Sansone EMAIL Department of Electrical Engineering (ESAT) KU Leuven Robin Manhaeve EMAIL Department of Computer Science KU Leuven
Pseudocode	Yes	Algorithm 1: GEDI Training.
Open Source Code	Yes	The code is publicly available at https://github.com/emsansone/GEDI.git.
Open Datasets	Yes	Our theoretical findings are substantiated through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100... The code is publicly available at https://github.com/emsansone/GEDI.git.
Dataset Splits	Yes	Table 2: Clustering performance in terms of normalized mutual information (NMI) on test set (moons and circles). Higher values indicate better clustering performance. Mean and standard deviations are computed from 5 different runs. Table 6: The median and standard deviation of the accuracy and NMI of GEDI and Sw AV on the MNIST test set after training on the addition dataset.
Hardware Specification	No	The computational resources and services used in this work were provided by the computing infrastructure in the Electrical Engineering Department (PSI group) and the Department of Computer Science (DTAI group) at KU Leuven.
Software Dependencies	No	We train GEDI for 7k iterations using Adam optimizer with learning rate 1e-3... We use existing code both as a basis to build our solution and also to run the experiments for the different baselines. In particular, we use the code from (Duvenaud et al., 2021) for training energy-based models and the repository from (da Costa et al., 2022) for all self-supervised approaches.
Experiment Setup	Yes	We train GEDI for 7k iterations using Adam optimizer with learning rate 1e-3. We train JEM, Barlow, Sw AV, GEDI no gen and GEDI using Adam optimizer with learning rate 1e-4 and batch size 64 for 20, 200 and 200 epochs for each respective dataset (SVHN, CIFAR-10 AND CIFAR-100). Further details about the hyperparameters are available in the Supplementary Material (Section I).