Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GMM-based VAE model with Normalising Flow for effective stochastic segmentation

Authors: Conghui Li, Chern Hong Lim, Xin Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on LIDC, Crack500, and Cityscapes datasets show that our approach outperformed state-of-the-art in curvilinear structure and medical image segmentation.
Researcher Affiliation	Academia	Conghui Li School of IT Monash University Malaysia Bandar Sunway, Selangor, 47500, Malaysia EMAIL
Pseudocode	No	The paper describes the methodology using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Because of the confidential consideration, repository will be access after published.
Open Datasets	Yes	Experiments on LIDC, Crack500, and Cityscapes datasets show that our approach outperformed state-of-the-art in curvilinear structure and medical image segmentation. The Lung Image Database Consortium image collection (LIDC-IDRI) consists of thoracic computed tomography (CT) scans for diagnostic and lung cancer screening, with annotated lesions provided by multiple radiologists [2]. The Crack500 dataset is designed for pixel-wise pavement crack segmentation and consists of 500 high-resolution images, resulting in 3,368 cropped images of size 360 640 [43]. Cityscapes is a standard benchmark dataset for multi-class semantic segmentation [11].
Dataset Splits	Yes	Following the experimental setting of CCDM, we extracted a total of 15,096 slices of size 128 128, and divided the dataset into training, validation, and testing sets with a ratio of 60:20:20. The dataset is split into 1,897 training, 347 validation, and 1,124 testing images. It contains 2,975 training images and 500 validation images, each with a resolution of 512 1024, annotated across 19 semantic classes.
Hardware Specification	Yes	All models are trained using the Adam optimizer for 500 epochs with a batch size of 32 and all experiments are programmed by Pytorch 2.4.1 and conducted using NVIDIA A100 Tensor Core GPU.
Software Dependencies	Yes	All models are trained using the Adam optimizer for 500 epochs with a batch size of 32 and all experiments are programmed by Pytorch 2.4.1 and conducted using NVIDIA A100 Tensor Core GPU.
Experiment Setup	Yes	All models are trained using the Adam optimizer for 500 epochs with a batch size of 32 and all experiments are programmed by Pytorch 2.4.1 and conducted using NVIDIA A100 Tensor Core GPU.