Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LoMix: Learnable Weighted Multi-Scale Logits Mixing for Medical Image Segmentation

Authors: Md Mostafijur Rahman, Radu Marculescu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Lo Mix on several medical image segmentation datasets. Datasets, additional results and analyses including qualitative visualization are provided in the Supplementary Material. ... Table 1: Synapse 8-organ segmentation with Last Layer (LL), Deep Supervision (DS) [36], MUTATION [21], and our Lo Mi X. DICE scores (%) are reported for Gallbladder (GB), Left kidney (KL), Right kidney (KR), Pancreas (PC), Spleen (SP), and Stomach (SM). denotes the higher the better and denotes the lower the better. Results are averaged over at least three runs. Two-sided Wilcoxon signed-rank tests [35] indicate that Lo Mi X significantly outperforms LL and DS at α = 0.05. Best results are shown in bold.
Researcher Affiliation	Academia	Md Mostafijur Rahman Radu Marculescu Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78703 EMAIL
Pseudocode	Yes	Algorithm 1 outlines the training procedure for our U-shaped segmentation network with the Lo Mix module. We train end-to-end with Adam W [17], updating both the usual network parameters and the loss weight parameters α. Pseudocode is given in Algorithm 1 below.
Open Source Code	Yes	Our implementation is available at https://github.com/SLDGroup/Lo Mix.
Open Datasets	Yes	Our two multi-class segmentation datasets are Synapse Multi-organs 1 and ACDC cardiac organs 2. ... Our binary breast cancer segmentation dataset, BUSI [1] contains 647 images: 437 benign and 210 malignant. Our skin lesion segmentation dataset. Our three polyp segmentation datasets are Kvasir [13] (1,000 images), Clinic DB [3] (612 images), CVC-Colon DB [29] (379 images), and ETIS-Larib Polyp DB [28] (196 images). Furthermore, we use ISIC2018 [6] (2,594 images) for skin lesion segmentation.
Dataset Splits	Yes	Following the Trans UNet [3], 18 scans (2,212 slices) are used for training and 12 for validation/testing. ... We follow the Trans UNet protocol using 70 cases (1,930 slices) for training, 10 for validation, and 20 for testing. ... We use 80% of the data for training, 10% for validation, and 10% for testing in BUSI, Kvasir, CVC-Colon DB, ETIS-Larib Polyp DB, and ISIC2018 datasets.
Hardware Specification	Yes	Our methods are implemented and evaluated using Pytorch 1.11.0, operating on a single NVIDIA RTX A6000 GPU equipped with 48GB of RAM.
Software Dependencies	Yes	Our methods are implemented and evaluated using Pytorch 1.11.0, operating on a single NVIDIA RTX A6000 GPU equipped with 48GB of RAM.
Experiment Setup	Yes	Model optimization is achieved with Adam W [17] optimizer with learning rate and weight decay set to 1e 4. ... For multi-class segmentation in Synapse Multi-organs and ACDC datasets, we use an input size of 224 224, and optimize the combined Cross-entropy (β=0.3) + DICE (γ=0.7) loss. We train models for 300 and 400 epochs with a batch size of 6 and 12 for Synapse and ACDC datasets, respectively.