Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ShapeEmbed: a self-supervised learning framework for 2D contour quantification

Authors: Anna Foix-Romero, Craig Russell, Alexander Krull, Virginie Uhlmann

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method by using a simple logistic regression classifier applied to the latent representation as a downstream shape classification task. We demonstrate that Shape Embed outperforms traditional statistics-based as well as learning-based methods on a range of different problems, including computer vision benchmarks and biological imaging datasets.
Researcher Affiliation	Academia	1European Bioinformatics Institute, European Molecular Biology Laboratory, Cambridge, UK, EMAIL 2School of Computer Science, University of Birmingham, Birmingham, UK, EMAIL 3Department of Molecular Life Sciences, University of Zurich, Zurich, CH, EMAIL
Pseudocode	No	The paper describes the proposed approach in Section 3 and its subsections, outlining the steps and components without presenting them in a structured pseudocode or algorithm block.
Open Source Code	Yes	Shape Embed is implemented in Python and is available at https://github.com/uhlmanngroup/Shape Embed under the MIT license. Further implementation details are provided in Supplementary Section B.
Open Datasets	Yes	MNIST. The MNIST benchmark dataset (GNU GPL, [Deng, 2012]) consists of grayscale images of handwritten digits from 0 to 9, with approximately 7, 000 images per class, amounting to a total of 70, 000 images. MPEG-7. The MPEG-7 CE-Shape-1 Part B dataset (LGPL-3.0, [mpe, 2009]) is a benchmark for shape matching and retrieval tasks. It consists of 1, 400 binary masks of objects belonging to 70 classes, with 20 images per class. BBBC010. The Broad Bioimage Benchmark Collection 10 (BBBC010, no license, [Ljosa et al., 2012]) is a biological imaging dataset designed to test phenotypic profiling at the whole-organism level. MEF. The Mouse Embryonic Fibroblast (MEF, MIT License, [Phillip et al., 2021]) dataset is a challenging biological imaging dataset containing 300 images of multiple cells distributed across three classes: circle-patterned, triangle-patterned, and control (non-patterned) surfaces, with 100 images per class. He La Kyoto. The He La Kyoto dataset (Cell Cognition project, CC-BY 4.0 License, [Held et al., 2010]) consists of fluorescence microscopy images of H2B-m Cherry-stained He La Kyoto cell nuclei... Mouse Osteosarcoma Cells (MOC). The MOC dataset (MIT license, [Miolane et al., 2020]) consists of fluorescence microscopy images of mouse osteosarcoma cells.
Dataset Splits	Yes	In our experiments, we divided each dataset into an 80% / 20% split for training and testing, respectively, relying on stratified sampling [Särndal et al., 2003] to account for class imbalance.
Hardware Specification	Yes	Experiments were conducted on a machine with an Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz and an NVIDIA A100 80GB PCIe GPU. In all experiments, we used the ADAM optimizer [Kingma and Ba, 2015] with a learning rate of 10 3 for 350 epochs for the datasets we considered.
Software Dependencies	No	Shape Embed is implemented in Python using the Py Torch library ([Paszke et al., 2019], BSD-style license available at https://github.com/pytorch/pytorch/blob/main/LICENSE) and is available at https://github.com/uhlmanngroup/Shape Embed.
Experiment Setup	Yes	In all experiments, we used the ADAM optimizer [Kingma and Ba, 2015] with a learning rate of 10 3 for 350 epochs for the datasets we considered. ... The hyperparameter β allows tuning the model to focus more on feature extraction and reconstruction (smaller β) or on producing a smooth latent space that can be used in a generative context (larger β) [Higgins et al., 2017]. We empirically set it to 10 10 by default, as this value was observed to balance accurate reconstructions and meaningful sampling in the latent space. The hyperparameters γ, δ, and ϵ are all set by default to 10 5, which was empirically found through hyperparameter tuning.