Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scale Equalization for Multi-Level Feature Fusion

Authors: Bum Jun Kim, Sang Woo Kim

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments showed that adopting scale equalizers consistently improved the m Io U index across various target datasets, including ADE20K, PASCAL VOC 2012, and Cityscapes, as well as various decoder choices, including UPer Head, PSPHead, ASPPHead, Sep ASPPHead, and FCNHead. We observed that injecting scale equalizers into multi-stage feature fusion improved the m Io U index compared with the same models without scale equalization (Table 2).
Researcher Affiliation	Academia	Bum Jun Kim EMAIL Department of Electrical Engineering Pohang University of Science and Technology; Sang Woo Kim EMAIL Department of Electrical Engineering Pohang University of Science and Technology
Pseudocode	Yes	Algorithm 1 Efficient Implementation via Initialization
Open Source Code	No	The paper discusses modifications to existing architectures and their efficient implementation, but it does not provide a direct link or explicit statement about the release of their own source code for the described methodology. It mentions using 'MMSegmentation: Open MMLab Semantic Segmentation Toolbox and Benchmark. https://github.com/open-mmlab/mmsegmentation, 2020.' which is a third-party tool.
Open Datasets	Yes	Experiments showed that adopting scale equalizers consistently improved the m Io U index across various target datasets, including ADE20K, PASCAL VOC 2012, and Cityscapes, as well as various decoder choices, including UPer Head, PSPHead, ASPPHead, Sep ASPPHead, and FCNHead. The ADE20K dataset contains scene-centric images along with the corresponding segmentation labels. The same goes for the PASCAL VOC 2012 dataset with 21 categories, and we followed the augmented PASCAL VOC 2012 dataset. The Cityscapes dataset contains images of urban street scenes along with the corresponding segmentation labels. Using the KITTI dataset (Geiger et al., 2013), we trained the model with and without scale equalizers in the feature fusion module (Table 5).
Dataset Splits	No	The paper describes data augmentation and preprocessing steps (e.g., crop size, random resize, random flipping, photometric distortions) for the datasets used (ADE20K, PASCAL VOC 2012, Cityscapes, KITTI), but it does not explicitly state the training, validation, and test dataset splits by percentages or sample counts in the main text.
Hardware Specification	No	The training was conducted on a 4 GPU machine, and Sync BN (Zhang et al., 2018) was used for distributed training.
Software Dependencies	No	The paper mentions 'MMSegmentation (Contributors, 2020)' and various optimizers like 'Adam W' and 'stochastic gradient descent with momentum', along with 'Sync BN', but does not provide specific version numbers for any key software libraries, frameworks, or programming languages used (e.g., Python, PyTorch, CUDA, etc.).
Experiment Setup	Yes	For training with Swin and Twins encoders, Adam W optimizer (Loshchilov & Hutter, 2019) with weight decay 10e-2, betas β1 = 0.9, β2 = 0.999, and learning rate 6e-5 with polynomial decay of the 160K scheduler after linear warmup were used. For training with Conv Ne Xt encoders, Adam W optimizer with weight decay 5e-2, betas β1 = 0.9, β2 = 0.999, learning rate 10e-4 with polynomial decay of the 160K scheduler after linear warmup, and mixed precision training (Micikevicius et al., 2018) were used. The training was conducted on a 4 GPU machine, and Sync BN (Zhang et al., 2018) was used for distributed training. We measured the mean intersection over union (m Io U) and reported the average of five runs with different random seeds. For training on the Cityscapes dataset, stochastic gradient descent with momentum 0.9, weight decay 5e-4, and learning rate 10e-2 with polynomial decay of the 80K scheduler were used.